ID 1B7 930 



^ DOCDHBNT PKSOHB 



CE 025 629 



INStlTOTlON 

SPONS IGINCT 

EOFIAO NO 
POB DUlTl 
CONTBUCT 
NOTE 

AVjlllABLI FBOH 



Spir«r^ Janet Ed, 

P«rforMnc« Testing: Ifsues Pacing vocational 
Educatlbn-* T^eficarch and Oevelopnent Series No. 
190. 

Ohio State Onlv,,. Coluibus, National Center for 
Besearch In Vocaticnal Education, 

and Adult Education 



Bureau of Occupational 
Vashington, D.C, 

a9BNH90003 
(90] 

300-78-0032 
192p. 

National Center Publications, The 
Besearch in Vocational Education, 



(DHEH/OE» 



National 
The Ohio 



Center for 
State 



Oniversi^y, 
t«11,00) 



1S60 Kenny Bd., Coluibus, OH 43210 



EDFS PBICE HFOI/PCOB Plus Postage, 

DISCBIPTOBS , ♦Educational Philosophy: ♦Legal Besponcibillty; 

♦Perforsance Tests: ♦Progras Iiplen^ntatlon; ♦T^st 
Construction: Testing: Testing Programs ; ♦Vocational 
Education \ 

ABSTBACT 

Addressing issues facing vocational education on the 
topic of perfcrjiance tcisting, this ha^idbook consists of a collection 
of seventeen cOBsissich^^djrapers and reactions to the papers. Two 
papers are presented on each of the following types of Issues that 
•ust fee considered! before a perforaance test can be constructed: - 
philosop^^ical, tecnnical, legal, and 1 iplenentatlon issues. Authors 
were selected to forn a oiultidlsclpllnary group to address each 
issue, and a reaction to the two papers presented on each issue is 
included. The two pt^pers on philosophical issues are authored by 
Henry Borov and Jack C. fflllers: reactions are given by John F. 
Thompson. The' two papers on technical issues are authored by Evelyn ; 
Perloff'and Baysond Klein; reactions are given by ;$aBuel A. 
Livingston. The two papers^ on legal ii^ues are authored by' Paul L. 
Tractenberg and Diana Pullin; reactions are given 'by Hllllan G. Buss. 
The two papers on i Hplesentaticn issues are authored by U. Brinton 
Milwaxd an-d Cnrtis B. Finch: reactions are given by Janet E. Spirer. 
Finally* . tifo papers are included that discuss the iipllcationa of the 
contents of all the papers for vocational education. These papers are 
authored by tolStsrt E> Spillsan, Charles D. Nade and Nellie Carr 
J^orogood; reactions are given by Marvin B. Basnussen.. (BM) 



4 1*1* 4i*****4i***i|i4(*>l>*#******4*4i* 

♦ Beprodvctions supplied by BDBS arc the best that can be aade , ♦ 

♦ , frca the original docunent. ♦ 



A- 



RtMAfch and 0«v«lopm«nl StrlM No. 190 



PERFORMANCE TESTING: 
ISSUES FACING VOCATIONAL EDUCATION 



compiled and tdlttd b)f 
Janotl. S^lrtr , 



• I 



i ■ ' , - ■ ■' 

Thi Nitloftal Ctniir (or R«9«treh [n Voeitlonal Eduentlon 
tftt Ohio 8tatt Unlvortlty 
1960 Kohny Road 
Columbus, Ohio 43210 



us DIPAMTMINTO^NIALTM. 
IDUCATIONAWBLrARI 
NATIONAL INtTIUITt OP 
•OUCATION 



THIS DOCUMENT HAS iCEN ftCPIlO- 

ouceo EXACTfeY AS hcccivco rnoM 

THE PERSON OR ORGANIZATION ORIOIN* 
ATINOIT POlNrsOP VIEWOR.Of»INION$ 
STA.TEftDONOT NCCESSARILY REPRt- 
>ENT OFFICIAL NATIONAL INSTITUTROF 
EDUCATION POSITION OR POLlCV 



* 



THE pJATIONAL CENTER MISSION STATEMENT 

Thf Nfi^fonal Ctntvr for RMttrch in Vocational Eduottion't minion ik 
ta ihortiM tht ability of divtraa ao^itt, initltutlonf, and organizations 
tb tolva •duoational probiamt relating to Individuai oaraar planning, 
praparation, ind progftttion. Tha National Cantar fulfills Its miition by: 

• Ganarating knowtadga through raiaaroh 

• Davaloping aducational programs and products 

• Evaluating (ndividual program naadi and 6utcomas 

• Installing ^ucational programs and^roducts 

• Oparating information sVstams and servicas 

• conducting laadarship davalopmant ahd tmir^ing' 
' program! 



rUNDINQ INFORMATION 



Project TItIt: 



Contrtct Number 

Project ^umb«r: 

Educational Act Undar 
Which tha Funds Wara 
Admlnlstarad: 

Souroa of Contract: 



Projact Offlcar 
Contractor: 

Exacutlva DIractor: 
DIscialmar 



pl^rlrr^^hatlpn: 



Tha National Cantar for Rataarch in Vocational Education: 
Evaluation Handbook: Parfornianca Taatino: Ittuat Facing Voca- 
tional Education 

OEC-300-78-0032 

406 NH 00003 



Education Amandmems of . 1079, 
P.L. 04-482 / 

[^•partmant oUHaaith. Education, and Walfart Unltad'Statat Offica 
of Educatioa-euraau of Occupational and Adult Education, Waahlng- 

ton. DC / 

PAulM/nchak 

Tha'^ational Cantar for Rataarch in Vocational Education Tha 
(io Stata Univartity Columbus. Ohio 43210 

Robart E. Taylor 

- ■ ■ ! • ■. ^ 

Tha matarlal for this publication was praparad pursuant to a 
contract with tha Buraau of Occupational and Adult Education^ 
U.S. Dapartnftant of Haalth. Education, and Walfara. Contractors 
undartaking such projacts iindar govarnmant s|)onsorahlp ara 
anoouragad to axprasa fraaly tNfIr Judgmaht In profaaalonal and 
taohnleal mattara. Points of vlaw or ciplnlpni do rtot tharafora, 
naaaaaarlly rapraaant official U.S. Offloa of Education position or 
poNcy. 

TItIa VI of tha Civil Rights Act of 1004 stataa: "No parson in tha 
Unltad Statas shall, on tha grounds of raoa, color, or national 
origin, ba axdud^d froinl^ partlelpatlon In, ba danIM tha banaflta of, 
or ba subjabta^ to disorlfnination undar any program or activity 
raoalving ffdaral finanplil aaslstanoa." TItIa IX of tha Education / 
Amandmantrbf 1072 sti^as: "No parson in tha Unltad stataa shall, 
on tha buip of sax, ba axoludad from participation in, ba daniad 
tha banafits of, or ba sulstjactad to discrimination undar any 
aduoatlon program or activity racalving fsdarai financial asilst- 
anoa." Tharafora. tha National Cantar for Rasaaroh In Vocational 
Eduoattoh, Ilka avary proigram or adtivlty rioalvlng fini|riolal 
asalatftt^ from tha U.S. Oapartmant of Haalth, Education, and 
Walfara, must oparata in oompllanca with thasa laws. 




TABLE OF CONTENTS 



PORiWOIID 
PREFACE .. 



P«0* 

... V 



CHAPTER ON|b INTRODUCTION 1 

P9rtormpnc9 TMStlng: An €fv9rvi9w ^ 3 

Staphcin J. Slater 

CHAPTER TWO: PHILOSOPHICAL (SSUES ' 19 

' P9rtorm§nc9 Testing and Social Reaponalblllty: An laauaa Analyala 21 

^ Henry BoroJW 

Philoaophielkl l^auaa In Parformanca Taating — 33 

Jack C. Wiilers 

Coniirimnta on tha Phlloaophlcal laauaa In Parformanca Taating 49 

John F. Thompson 

CHAPTER THREE: TECHNICAL ISSUES 5i 

Tactinlcal Conaldaratlona: Validity, Raiiablilty, Efflciancy, and 

. Obaarvt/Ratar Variability , ^ . . 53 

Evelyn ferloff 

Some Salactad Jachnlcai iaauaa Raiatad to Parformanca Taating 67 

Raymond Klein 

Commanti on tha Tachnicai issuaa In Parformanca Taating 85 

Samuel A. Livingston > 

CHAPTER FOUR: LEGAL ISSUES 89 , 

L9gal Implicationa of Parformanca Taating In Vocational Education: 

An Ovarviaw' ; v . 91 

r Paul L. Tractent>ero ' . 

Performance Taating in Vocational Education— 

Leaaona to be Learnad frorp the Minimum Compe'tancy Taating Movamant ". . . 109 

Diant Pullln 

Commenta on tha Legal laauaa in Performance Teating 121 

Wllllarn Q. Buss , 

CHAPTER five: IMPLEMENTATION ISSUES 127 

Performance Teating aa an Organizational Innovation 129 

H; BHnton Milward ^ 

Conalderationa In tha Imfilementation of Performance Teating ". 139 

Curtia R. Finch 

' Commenta on the Implementation laauaa in Performance Teating — .149 

^ Janet E. Sipirer 



' 5 



CHATTIII SIX: IMPUCATIONt POR VOCATIONAL IpUCATION 

Implkftlona of ttf Ifu— for Voo9tk)n$l EduoMtion: 

A Viewpoint / 

Robert E. Splllm/n and Chtrl«t D. Wad« 
lmpllc9tlont of mrformwo* TMtIng on Vocational Education . 

\ Ntlllt Oarr ThbroQOO^ v 

Implleatlona fpr VdoaHonal Education: 

A Tfiird Point of Vhw . ^ ; 

Marvin Ry^atmuMan 

bHArriRilYINt-QLOMAflY 

CMATTiR IIOMTi CONTRItUTORt 



/ 



/ 



FOREWORD 




FOREWORD 

Perfftrmance tesMno to measure student achievement is oj\e evaiuation method t>eing 
advocated by a nucnb^rof groups. However, the trends toward accountabiiity of aii public 
programs and the advent of such movements as minimum competency testing has raised 
concerns with Which vocational education must deal if it is to expand its use of performance 
lestmg: ^ . " ' . 

P§rform9nc9 Tasting: laauea Facing Vocational Education addresses some of these 
coWcerna.jUsing a multidlsclplinary approach, aeventeen persons were selected to provide their 
views on one of four areas— philosophical, technical, legal, and implementation— and the 
implications of these issues for vocational education. The multidisciplinary approach resulted in 
providing k, mix of thoughts which are designed to leave the reader with some new ideas and 
other ways to look at some old ideas. 

The National Center expresses its appreciation to the seventeen contributors to the 
handbook: Henry Borow, University of Minnesota; Williarfl Q. Buss. University pf Iowa; Curtis.R, 
Finch, Virginia Polytechnic Institute and State University; Raymond S. Klein, National 
Occupational Competency Testing Institute; Samuel A. Livingston, Educational Testing Service; 
H. Brinton Milward, University of Kentucky; Evelyn Perlbff, University of Pittsburgh; Diana C. 
Pullin, Center for Law and Eduoatioh, Inc.; Marvin R. Rasmussen, Portland Public Schools; 
Stephen J. Slater. Oregon Department of Education; Robert E. SpillrrMn. Kentucky Bureau of 
Vocational Education; John F.^hqlnpson. University of Witconsln-Madiaon; Nelllcl Carr 
Thorogood. San Antonio College; Faul L. Tractenberg, Rutgers University; Charles D. Wade, 
Kentucky Bureau .of Vocational Education; and Jac1( C. Willisrs, Qeorge Peabody College. 

' J, Stanley Ahmann, Iowa State University, K^neth'^ddy, Vocational-Technical Education 
Consortium of States, and William Osborn. Human Resources Research Organization, provided 
useful suggestions on an earlier draft of the mariusoript. , ' 

The National Center is particularly indebtedid Janet E. Spirer who edite^ this haridbook and 
directed the project with assistance from Nancy F. Stephens, program assistant and Ron 
Schilling, graduate research associate. Recognition is alsq due to N. L. McCaslin, associate 
director for evaluation and policy and Floyd L. McKinney, program director, for their assistance 
throughout the jsroject. In addition, appreoif tlon la extended to Nancy Powell and Carolyn 
Hamilton who typed and edited the manuscript, respectively.'^ 

On behalf of the National Center, I want to exprsss appreciatij^n to the Bureau of , 
Occupational and Adult Education, U.S. Office of Education, for sponsoring this evaluation 
handbook. 

« - j' ■ ■ ^ # ' 

» . ' " ' . > 

Robert E. Taylor 
, Executive Director 

The National Center for Research 
in Vocational Education 



PREFACE 



PRIFACE 

A Bit of History' . \ 

W was almost a century ago tha4 the infant science of psycfiology ttegan to put into serious 
pradtlce Afexanaer PopB's dictum, "The proper study of mankind of man." Wllhetm Wundt 
established his psychological laboratory In Leipzig, Germany, In 1878. Jarties McKean Cattell, a 
young American who studied with Wundt, was convinced that the inconsistencies In the 
laboratory's psychological findings, which Wundt himself insisted were mainly errors of 
measurement, were In reality Indications of Important variations In human mei^tal makeup. 
Pursuing his studies of human responses to simple mental tasks, first at the University of 
Pennsylvania around the year 1900 and, a few years later, at Columbia University, Cattell 
essentially launched the objective testing movement In America and Is generally recognized, 
along with an older contemporary. Sir Francis Qalton,-as a founder of the stibsclerlce of the 
psychology of Individual differences. 

Early application of rrieasurement rules to the objective and systematic observation of 
student achievement appeared In the work of J.M. Rice in 1897. Rice constructed a spelling test 
and sampled'the spelling abilities of mjpHs it^ twenty-one cities. The popularity of objective tests 
. of educational achievement to me^nire students' subject-matter mastery grew rapidly, and 
nationally standardized^eating'programs.were subsequently adopted, not without controversy 
and abuses. For many decades, achievement testing took the form of paper-and-pencil tests of 
cognitive objectives (primarily information) of ctasscoom instruction. Performance tests of 
training outcomes, as we know them today, occupied a relatively obscure place In the early 
history of educational testing. 

A parallal development li^ the testing movement within psychology did, however, produce 
technical advances that expanded the rarrge of testing practices in the schools. The individual 
jnental testing methods pioneered by Alfred Binet in France proved impractical for the 
large-scala tasting of army recruits in World War I. A five^mkn committee, headed by Robert M. 
Yerkes,. waa appointed by the American Psychological Association to develop a group test of 
general mental 'ability. The* product of this team effort was th^ Army Alpha, an'Instrument that 
proved to be an expedient way of screening people for training as officerrand technical 
(HKSonnel. ^ , - 

1 Ttif Army Bata intalllgance tests were Constructad for the testing of Illiterate recruits,*a 
' d«|ica that forealSadowed the appearantft^of a wide array of nonverbal and manual tests. To 
asaiign pertonnal to such .duties aa coding, baking, and mechii^nical maintenance, the army 
devtlopfd a series of oral trade test^, these representing in all likelihood the first mass use of 
perfoi'mance like testa'for purposes of certifying occupational f|tnes,s^ Questiohs from an efral 
trade tttt for tha position of machin1fl|,^dle sinker llluatrate the knowledge approach used: "What 
iviil happanLto the dies If they are ovarheated and cooled to6 quicldy?" "What Is the usual, finish 
a(loWan(fa on a drop forging?" "What machine is used for cutting a straight groove between two 
fieap holes?" ■ ' , ' 



PREFACE 



In the 1920s and 1930s numerous tests of psychomotor abilities and nonverbal problem 
solvina emerged, such as the Minnesota Mechanical Ability Tests. The technology of nonverbal. 
skmTesZ r^eived further significant impetus from the efforts of the World War II army aviation 
psycholcfey testing research program that produced the S.A.M. Complex Co-ordinator for the 
selection of military pilots. Although the evidence is not conclusive, it seems probable that the 
prorSlnence of such test, ^as later instrumental in shifti,|, the testing emphaais within vocaUonal 
education away from the exclusive use of paper-and-pen7ll information measures arid toward 
"hands-on" performance-type measures We can be more confident about the signi^cant impact 
of military personnel research during the 1950s and 196gs. The meticulous^nd sophisticated 
studies to develop and assess new performance testing procedures for technical iraining 
programs had direct relevance to the improvement of measures of student competence m 
vocational -education. 

Our View of Performance Testing 

* • 
The literature is replete with definitions of performance tests and- performance testing, such 



as: 



. An applied performance test . . . measures performance on tasks requiring the application of 
learning in an Actual or simulated setting. Either the test stimulus, the desired response, or 
both are! intended to lend a high degree of realism to the test situation. The 'dentlfying 
difference between applied performance and 9ther types ol tests is the degree to which 
testing procedures approximate the reality bf the situation in which the actual task would be 
performed.' 

. A performance test is a template-a template modeled from a job task and u^d to gauge 
the similarity of a trained behavior to thetiemands of thajjob task.' 

* 

• In vocational and technical education the term performance test expressly denotes a 

measure of cor^petency (skill level) in some specified field of occupational trainmg ^ The 
performance tests may measure the test subject s handling of the work process or the 
quality of the work product or both.* 

. A test of the^lass has designated as performance and product evaluation is one in which 
some criterion situation is simulated to a much greater degree than is represented by the 
usual paper-and-pencil test.* 

This handbook will not offer another definition of performance testing. Rather, the authors of 
the papers in this handbook Identified three attributes that they feel undergird performance 
testing in vocationftr education. First, performance testing procedures attempt to approKimate an 
actual situation drawn from a specific occupational context. Second, performance testing can 
cover some or all of the actual work situations through cognitive, affective and psychomotor 
domains from a process and/or product perspective. Third, performance testing results in a 
variety of outcomes, such as student certification, program evaluation, instructional planning, 
and information for constituencies. Thus, the authors perceived performance testing in 
vocational edufcatlon as an evaluative tool with a variety of possible outcome measpres. It differs 
from other types bf testing in that a performance test assesses a portion of all of an actual work 
setting by attempting to approximate the actual work setting. 



viii 

9- 



ERIC 



PREFACE 



Need for the Handbook 

^ The need for this handbook arises from a variety of sources. For example, the stress on 
accountability in publicly funded programs is reflected in the federal rules arrd regulations 
whereby state boards are required to measure student achievement by standard /:>ccupational 
proficiency measures, criterion-referenceicMests. and other examinatior^s of students' skills, 
knoWtedge, attitudes, and*readine9s for enterinQ employment successfully. Sinnultaneously, 
educators are attempting to respond to the perceived ineffectiveness of evaluation efforts to date 
by more closely matching the information needs of decision-makers tp the evaluation questions 
asked and methods used to gather and interpret the information. In response to these trends and 
others, this handbook wai designed to help teacher educators and" state and local education 
agency personnel respond to their evdiuatlorr responsibilities. 



The Approach 



* This handbook consists of a collection of commissioned papers and reactions to the papers 
that tocus on four types of issues that must be considered before a performance test can be 
constructed. The Issues-include: Philosophical Issues, Technical Issues, Legal Issues, and 
Implementati6n Issues. And, two paQ|rs are included thtit discuss the Implications of the 
contents of all of the papers for vociional education. 



In designing this handbook, we have compiled a multidisciplinary group of authors to 
address each issue area. Because the issue areas themselves are broad:^our space is limited, and 
the authors are drawn from diverse disciplines, you may find that4he authors did not address all 
relevant aspects of the issue area. To partially compensate, we have included a Comments 
section for eadt) issue area'that consists of a reQction to the two^papers. However, we feel that as 
a collection, the handbook wi^ provide you with a foundation en issues related to ptsrformance 
testing, and testing in ^neraf, that must be considered before a performance test is constructed 
and implemented. We believe that the multidisciplinary approach will open new insights for yoir 
as you read about each issue area from these different perspectives. The mix of authors should 
leave you with some new ideas and other ways to look at some old ideas. 



Ix 



to 



PREFACE 



Noftt 



'Borow, H. "Phllofophlctl. Practical, and Tachnlcal Imu^s' Pertaining to Performance Testing in 
Vocational Education." unpublished manuscript. (Columbus. Ohio: The National Center for 
Research In Vocational Education. 1979). pp 1-3. 

'Sanders, J.R. & Sachse. T.P. "Applied Pgirfbrmance Testing In the Classroom." Journal of ^ 
Wesearc/j afJ«f Oeve/opmenf /n fdi/caf/on. 1977. 10<3). 92-104 

>Osborn, W.C. Dweloping PTformanc9 fesfs for Training Evaluation (Professional Paper 3-73). 
(Alexandra. VA: Human Resources Research Foundation. 1973). 

^Borow. H. "Philosophical. Practical/ and Technical." p. 4. 

•FItzpatrick. R. and Morrison. E.J. "Performance and Product Evaluatloh." In R.L. Thorndike 
(EdT Educational Mfaajt/remenf (2nd ed.)'. Washington. DC: American Council on Education. 
1071. p. 238. 



CHAPTER ONE 



INTRODUCTION 




i - 



a»fOf» di9cu9$ing th§ four l89U99 facing p9rform»nc9 te^ln^—philoaophicitl, fchnlcttl/l»gkl ahd 
imphftt^nMlon-^a brM dit6u»$i0n of p^rformancB thating itaalf /« tha logicftl placa to baglp. 
$t9pt\an Slftar prdvidaa an ovarvlaw of parformanca tasting in Chaptar Ona. Ha t>agina yyitti a 
dlaouaaion of parformanca taatfng foouaing on tha "ranga of taat atimulua cfiaractarl$tlc%^ ' 
rffpOfift<>/»tfi6f»>'/#f/c», %ffd amoundfng conditionsj Uluatrating tfia dlatlnetlona batwaan 
fiiartoffhiriiia i^itta and btm klnda bf tatta." A typology of parformanca taata and a diacuaalon of 
'athf^tagaa and dlaad¥0aga$ foUow, Tfia ramaindat of tfia 0apar la foouaad on olaaaltyln^ 
tasting pufppaa, tachnlcal oonaldaratipna,^ and-coat conaldaratlohs with parformanca tasting, " 



INTRODUCTION 



Introduction to^l^trformanct Ttsting 

^ \ ' Stephen J. Slater ^ ^ ^ ^ 

\ . ' Oregon Depaltment of Education 

Salem, Oregon ^ ' ' • 



The person who ie considering usipg performance tests in vocational education is faced with 
a staggering array of questions for wHIch there are no easy answers. What constitutes a valid 
measure of occupational competence? What types of tests ^v'o^ost useful for guiding 
instructton? For evaluating program outcomes? What testing procedures result in the most 
reliable data? How does one begin to develop an instrument when none exists? What criteria , 
should be used in evaluating tests developed by others?'Yhis is Just a partial Mat. 

Educators in other fields are also wrestling with these questions. The net effect is that 
educational measurement is currently experiencing a period of chang^and recqnceptualization 
perhaps unprecedented since the days of Alfred Binet and James Cattpli. While once the 
standardized, norm-referenced objective achievement test modeled after Birret's, Catteirs and 
others' instruments were held in high regard, that unquestioned acceptartce is eroding. Today we 
are witnessing a broadening of testing options t|j(|[it is raising issues at a faster rate than they ire 
beirjg resolved. A cornmon thread running through these options is tfU complexity of human 
competency— and the inadequacy of the ubiquitous multiplS'^chdlce examination as a measure of 
competence. . ' - *^ 

r This chapter examines one facet of the testing options available to educators— performance 
testing. Throughout this examinationj^ we seek to provide the reader with a few answers to the 
question^ posed at the outset. 

What i9 Performance T9Sting? 

□•fining the meaning of the term "performance test" is not as straightforward as |t might 
seem. As with any term in pur language, its meaning has shifted over the course of time and also 
in the way it ha|^t>een in specific contexts. Pqr^example, \n the context of testing gaperal 
mental abllitie«,^he label traditTonally refers to tasks requiring a nonverbal response such as . 
arranging pictures in a logical sequence. In the armed lorceat, performance tests have been 
synonymoua with measures of paychomotor skills such aa speed in putting on a gas mask or 
disassembling a rifle. The'purpose of this section is to'propose a definition that conveys the 
current meaning of perforrnance testing in the field of educational measurement. 

Cronbach defines "test" as "... a sys'tematH^j^rocedure for observing a person's behiivior 
and describing it with the aid of a numerical scale or category syatem.*'* The big variable in this . 
definition is howJ^f term "bahtvior" la operationallzed; doirig so prescribe^ the characteristics of 
the stirfiulut eliciting the behavJor, th^ type of response called for. and the conditions under 
whiqh tha Dah|vior is displayed. Operatloniilizing behavior in these three respects is a heuristic 
technique fot clistinguishihg between performance tests and other kinds of tests. 



' A dtetlnctions betwwn performance tests and Other kinds Of tests. 

<itimalua ChartictBri8tlc$ 01 Tesfs. Any test contains a set of Instructions, a prompt, a 
H«mf nrc^? an eJiJJThS inmates the examinee's behavidr. essence, the stimulus 
•Xnl iith Ttask that "rbe sihp or complex, structured or unstructured, ambiguous or 
r.lTu^» ?hr.timls ^n^lso vary In Its fidelity or resemblance t, naturally occurring, 
•real life stimuli. • . 

For example, a student in an emergency medical tecftr.ician training 
at a teleS receives a simulated (role.played).call from a P«rent whose ^hl W has Just 
owed an unknown quantity of medication. The parent is nearly hysterical, so the student 

TdCrt tyS" of po^^ procedural steps In eliciting Inlorm.tlon. end how to rel.y 
information to rescue personnel. 

The simulated telephone call draws upon the student's knowledge in f;*)**'^ 

Otherwise not may be observed. , 1 . 

to-o«/,n«« rhmrmctarlstica of Tests. A major distinction to be made among .respoHse 
MSnr^yood^nt/pperl d,choton,y.. «"P°"r.etTd'X.^ 

»_ ...mni. n* « tAt bermlttlno operant behavior Is to give » student pilot a ch»nGe to land 
.„ .CS.™ CenJuSS wcSto? re.pons,»and us. of judgment are unconstr.ln«l by any 

to apply power In a landing, and right-of-way rules. . ' . • 

: r„'?rJS:<Jin?y':irhV.^^^^^^^ ou't. v^h^ther safety precaution, a™ .oHowed. 

:ERlc;::-..,:vL:-^.;, :.. , •.: . •• . , - — . , ' ...^,> . -.• 



INTRODUCTION 



whether diftgnostic information is interpreted correctly, and so on. On the other hand, the 
examiner may only be concerned with th^ outcome or product of the task: whether the 
malfunctioning part Is in fact Identified The c^hoico betwoen process and product nssossmont is 
influenced by a^auml^er of considerations such as testing purpose, nature of the task, and 
relative costs of each approach Ihese considerations will Ido explored further m a later section 



Surrounding Conditions. Closely related to the stimulus characteristics discussed above are 
environmental conditions under which a task"is performed. McGuire has pointed out that 
behavioral assessment in a naturalistic setting is often affected by the "accidents of nature and 
the flow of real problems -^ailable at the particular place and the specific moment in time when 
an assessment is ^o be nnade. This point brings us back to Cronbach's definition of a test as "a 
systematic procedure . . The "noiseV always present in reality can lead to "unsystematic 
proceduresjf cate is not exercised in ejther of two respects: (1) standardize ^hp surrounding 
conditiorls so it i9 possible to avoid confounding stimulus characteristics with irrelevant 
environmental conditions, or (2) systematically sample relevant surrounding conditions, building 
them into the test itself a^ variatiuons l^n the test stimulus. The former condition is typically easier 
to satisfy than the lat!er» but often it imj^ossible to do either. In such a case, bne must make 
the as^mption that uncontrolled situational variables do not bias the description of behavior. 



An example of how environmental characteristics can influence behavidral assessment is the 
classic case of evaluating student teaqhers'^performance. The college supervisor making the 
round^ to observe several preservice teachers may notice a distinct pattern in how glosely 
different ones adhere to their lesson plans. On one extreme., several seem never quite to make H 
through the rqdirhentary concept's they want to get across. Another group breezes through its 
planned activities, and the students are i?usily engaged in self-initiated projects. How does the 
student teaching supervisor take into account the fact that Jhe forrrier group is assigned to 
Inner-city Schools while the latter is located in suburban Schools surrounding the university? 

The three dimensions discussed above illustrate the ways in which performance tests differ 
from traditional paper-and-pencil achievement measures. They also provide a framework for 
describing variations in performance testing approaches and analyzirig relative advantages and 
disadvantages of alternative approaches. The next settion proposes 9 typology of performance • 
testing approaches based on their relative fidelity to real life situations. 

• ■ ■* . ' 

A Typology of Performance Tests 

Conceptual distinctions can b^ made among three primary types of performance testing 
approaches: direct assessment, work sarriple methods, and simulation techniques. Each 
encompasses a variety of measurement options and each has its own particular advantages dnd 
disadvarrtapes, affecting t|ie choice of when to use a given approach. 

Direct Assessment. The highest fidelity that can be achieved in assessing behavior required 
for success in a real life setting is through direct observation of behavior (br its outcomes) in that 
setting. Stimulus and response characteristics of the test and the surrounding conditions are 
assumed to be equivalent to those present in naturally occurring situations. Behavior exhibited in 
an actual work setting can be described in a variety of ways^the observer may use a rating'scale 
to record Judgments of the individual's effectiveness in a numt)er of dimei^sions, the observer 
may record the presence or absence of predetermined behaviors on a checklist, or the observer 




m«y count th* frtqutrlcy with which the Individual exhibits a particular type of tjehavlor In a 
given time interval. Direct aawtsment of products or outcomes also can rely on rating scales or 
checklists In which, the results of the ln«Jlvldiial's performance Is judged/ 

■ Direct •••ewment can vary In Its obtruslveness:. that Is. the Individual may or may not be 
aware that hit or htr performance is being (or will be) evaluated. This constitutes an Important 
advantage for the technique, relative to the oth«r performance testing approaches discussed in 
this chapter To the extent that the behavior Is exhibited In an ongoing, nonartlficlal environment, 
unobtrusive observations can be made of how Individuals do perform as opposed to how they 
can perform. Often it Is not ethical or feasible to employ uncfttruslve measures. b"\<*"'«^^ . 
assessment methods do afford the opportunity by virtue of the fact that environmental conditions 
are not fnanipulated for the sake of performance testing. - ^ 

An txtmple o> direct taseesment is the case rnentioned above where student teaches are . . 
observed In their actual classrooms. Another example Is the behlnd-the-vyh<Jel (flying test 
administered to drivers' license applicants. A third example Is the eva^uat^on of Interns In clln^al 
JeWnas Product evaluation as a direct assessment method Is exemplified by judging a In shed 
plv:e of work done by jun apprentice plumber suci? as detemiining the watertight nes. of »lpes 
joined together. All examples are* characterized by nbnmanipulatlon of the stimulus and 
environmental c>)aracteristics surrounding the situation In which the performance lb observed.' 

iVor^ Sample Methods. Evaluation of work samples Is distinguished from (^Irpct assessment, 
techniques primarily on the busis pf where the performance Is observed. Whereas direct 
assessment takes place^in the s«|tthg where the behavior Is normally displayed, work samples 
can be obtained In a more contrdjied setting. 

A second dlstlngolshlno feature is the examiner s ability to preipecify the task. Under 
direct aiaetament- approach, tasks preientliHg themselves to the Individual are not manipulated 
eSer? InTwork sample measure, on the other hand, theintent Is tp standardize tasks 

across examinees. . 

A third diatlnction is the timiifrarTle in which the task is performed. Direct assessment 
methods do npt impose tUirp llmHs fb{task performance, but work samples are often 
standardljE«d In terms of time allpvywl lor task completion. 

' Direct assessment and work tarn Jle methods share certain common features as well. The 
tools materials and other resources the examinee works with are equivalent to those available 
^ h;T^ Task^^ to th# exsmlnCs are equivalent to task, performed lrrr*aMlfe settings. In 
terms of ths test chsrtcteristlcs discussed above, work samples have high fidelity tp real life 
tasks In the stimulus and response dimensions, but ^urtoundlng conditions tend to be somewhat 
i!?Hic;l.TuSh.rmore. even though the test stimulus m*ors that ♦ound In th^^^^^^ 
it is in fact controlled and specified by the examiner, enhancing replhsability of the task across 
•xanilnsas. - , *• 

* Examples of work sample techniques abound in vocational education. The Plymouth* 
Tibubleihootlrto Contest is a case in which a discrete set of auto mechanlC'Skiils Is assessM 

♦unWr itahdard&id dOndUlons. Hera the task is specified in advance, requiring contestants to 
Sihttha source of trPuble in * malfunctioning automobire^aing whatever procedures they 

' dlMSn tpproprtate. Thair soora is b»sd on speed In locating the defective component; as such, 
this is an example of product evaluation. 



INTRODUCTION 



A second •xampla of a work-sampiv test Is the Seashore-Bennett Stenographic (Proficiency 
Te4t administered to prospective secretaries * In this test, a taped voice dictates five business 
ietwrs of varylnjj lengths at different speeds. Examinees are given thirty rr^nutes to transcribe 
their shorthand notes Again, this Is an'example of product evaluation In that the examiner is 
rested only in speed and accuracy. 

A work%ample perfdyftiance teiiit eyempllfylrtg process evalMatlon Is the case In which 
lervlqe teechtrt are asked tp prepare and teach a mln4-lesBon on a given topic to a small 
jp of students, the performance Is videotaped and later evaluated by the master teacher, the 
stujlent teacher, and perhaps the student^' peers. Typically, a detailed coding form is used tc 
quantify the types of behavior exhibited, such as using^dlfferent questioning strategies. 9(vlng 
stufcients positive reinforcement, following up on students' responses, maintaining eye cort)act. 
an<^ SD on. In this type of microteaching work sample, the Intent Is to evaluate various 
comportents of overall performance for the purpose of helping the student teacher Improve In his 
,orner arens of weakness. 

I Simulation TBchniques. As the term Is currently used In educational measurement, 
simulation reters to the process of absiractlnig'some aspect of reality and concretely representing 
It In the form of a specific task that examinees are expected to perfonrp.' Simulation accounts for 
arl enorhioui spectrum of performance testing approaches, varying irt thetr degree of 
"l(b8traOtlon'' from real life situations. At one extreme, simulation overlaps with work sample 
nyetliqids In tasks that recreate problems and events occurring In an actual work setting. At the 
o|her extreme.* simulation techrftques can sacrifice some fidelity In both stimulus and response 
dimensions for the purpose of'galning more control over the testfng situation or avoiding the 
cosilier aspects of duplicating reality. The range^of perforrf)a|ioe testing approaches labeled as 
simulations Includes paper-and-pencil problem soMng exercia^ss, dyadic or small group 
role-playing techniques, man/machine Interactions, computerized games^ and audi(^vlsual 
representations of real life stimuli to which examinees react. ' > , * 

In terms of fidelity to, actual situations, almulatlon technlqul^cover the considerable' middle 
ground between ot^jectlve pap^r-and-pencirexamlnatlons and work samplee or direct 
assessment. Unlike the latter two types of pdrf^mance testing approaches that maintain high f 
fidelity In the stimulus and response characteristics of tasks, performance tests labeled as i - 
simulations imltate>but do not duplicate reality In these two (;|lmen8ions. Of course, the ^ 
conditions surrounding the simulated task are typically unlike thos^ pharacterlatlcs.of real life 
sitMatlons. ^' 

Use of simulation as a formalized technique for performance evalautlon is relatively recent. 
In contrast, use of simuliijllon in tralnini^ can be traced to the sand table military war garnet of 
the nineteenth century, If not earlier.* Not until World War II were simulation and gaming . 
techniques tystematically developed for ataetament purpot^s.* In th^ years just prior to World 
War II. the German Army developed standardized situational tests ot team performance to select 
and train military personnel. British and American explorations In thSL Use of simulation for 
assessment were soon to follow. . 

• * , 

In 1943, a procedure for selecting espionage agents to serve In the Office of Strategic ^ 
Services (OSS) took form.^<* The central feature of the three-day OSS assessment program was 
the use of situational testa designed t0 elicit behavior predictive of performance In actual 
settings. Recruits were observed in several individual and group-based exercises and then rated 



SLATER 



on such dimensions as leadership, practical mtelligen<ie, motivation, social relations, and 
emotional stability " For oxamplo. pno task requirod ^ group of six candidates to transport n 
hOQvy &ic.k. n loq and thomselvos across an «iqt>t foot stream, tJSinq only a few boards (loss than 
eight feet long), thrj^e lengths of rope, a pulley, a bntrel, and whatever trees wore available 
Another exercise, the "Manchuria Test," provided background facts to a candidate who was then 
to prepare propaganda designed to lower the morale of Japanese railway workers " 

• / ; 

Following the war, the use of simulation techniques to predict future performance found 
applications In industrial settings ns vVibII The first use of 'assesssment center" techhiques. as 
-they came to be known in business cj^ntexts is aCcrodftod to AT & T where, in 1957, Douglas 
Bray and RoberLGreenleaf initiated tfteir longitudinal study tracing the progress and - 
development of young managers in the company" The research project began with the 
participation of recently hired employees in a thtee-day series of business games, leaderless 
group discussions, interviews, and ail in-basket ^exercise." Participants were rated on twenty-five 
behavioral dimensions and predictior^s were made regarding their likelihood of reaching middle 
management Neither the ratings norjthe predictions were released td the organization for a 
period pf eight years, at which time those participants still at AT&T were reassessed The 
forebearance of the researchers in withholding the career progress predictions— which were 
highly accurate- en hahced cr^jdibilit.y of t+ie teicti^ye s predictive validity. 

More recently, siftiulation has t)e^r> used a§*| evaluation technique in educational settings. 
Beginning in me early 1960s, Christine, McGuil^dftd her colleagues at the University of tllinois 
Medical jCenter developed several simulations. |Pky have found simulation useful In assessing 

four critical components of competence: obse^vafional and interpretive skills, problem solving 
skills, Interpersonal and communioajion skills, and technical skills '* The four major types of * 
sirnulation procedure!* used in medical education" are: (1) paper-and-pencil "progrfifmrned" 
examinations simulating ar^ encounter b^ween physician and patient in which exanlinees' 
abilities in clinical diagnosis and patient maniigemerit are assessed. (2) audiovisual simulations 
that require \he examinee to describe'and interpret auditory or visual information (e.g., heart and 
lung sounds), (3) rold-p<aying oral interviewirtg exercises in which the examiniar ^licits diagnostic- 
Information from a trained "patient." usually used to assess interpersonal skills as well as,cllnlcal 
information gathering, and (4) computer-managed robots that can be programmed to present thp 
examinee with a variety of problems and respond appropriately to different physician 
interventions. i > 

These brief descriptions of simijlatlon techniques developed over the last thirty-five years do 
not begin to convey the richness and variety represented by this approach to performance 
testing. Th6 wgrk meiitloned above covers only a few landmark accomplishments In the 
assessment of complex human performance. The technology of sirpulatjon is expanding rapidly 
in response to the. need for valid 9n<)l economical predictors of competence in real vyorlcl settings. 

Advantages and Disadvantages of P^rformhnte Testing Approaches 

So far this chapter has introduced the critical dimensions on which performance tests differ 
from traditional academic achievement teisits and that serve to discriminate among various types 
of performance tests. The preceding discussion has also proposed a three-part typplogy of 
pe'rforrnance testing approaches, Illustrated with specific example^. However, the foregoing has 
not directly addressed factors affectliHg the use of performance tests. 



INTRODUCTION 



'f Th« prtfnlt* guiding thia chapier Is th«< any given measurement tool—whether a direct 
assessment method, work sample, simulation, or objective achievement t^st— is neither inherently 
vntuabte nor Inherently wqrthless Each is suited to a particjila'r set of testing purposes. 
Possesses different paychometrlc properties, Is meant to measure different types of behavior, and 
nsqulret grei^tr or Ipwer resources In test'planning, development, administration, and scoring. 

■ The iolectiOn and use of a specific Instrument Is carrleoj out with any rationality only when 
factory such as these are considered. 

A danger to be avoided In adopting any measurement approach Is to overemphasize one test 
evaluation standard at the expense of other relevant criteria. The following represents one 
reasonably comprehensive way pf analyzing the utility of performance tests. The Intent Is not to 
promote one performance testing approach over others, but to point out the cructal qCiestlons^ 
thisit should benddressed. These considerations are discussed in tf)ree categorles:>(1) Interaction 
of teatjng purpose with testing approach, (2) technlpal considerations (I.e., reliability and 
validity), and (3) cost considerations. . 



TBStlhg F^urpos6, One of the more powerful factors Influencing the design and choice of a 
measuren)ent tool Is the purpose for which data are being collected. Test use In education spans 
a variety of pur^ses. The moat rudimentary way of classifying testlr^) purposes Is to ask the 
following questions: , « 

• Are test scores sought for individual students or will scores be aggregated across students? 



• What kinds of decisions will be influenced by the test data? ^ * . 

Four conceptually distinct testing purposes can be identified, representing different ways of ^ . 
answering these two questions.*^ These are: (1) formative program evaluation, (2) summatlve 
program evaluation, <3) Instructional management and decision making, and (4) student 
certification. The former two are characterized by test score aggregation across individual 
students, whllelhe Jatter two call for the collection end interpretation of individual student data. 
All founr testing purpose^ affect different kind* of decisions. These^are discussed briefly In the 
following paragraphs. 

Formative program evaluation Is conceived as an int^gr^l part of the process of curriculum 
development and improvement. Formative evaluation provides answers to questions posed by the 
developers of a program— answers that serve to pinpoint Its strengths and weaknesses. As - 
pointed out by Cronbach, formative evaluation Is . , used to understand how the course 
pr^uces Its effects and what parameters influence Its effectiveness.*'^^ 

The goal of summative program evaluation, on the other hand, is to confront the question of 
a program's overall merit, relative to its cdmpetitlo/). The results of sumrnatlve evaluation are 
directed tqward those who control the decisl^iis about support and adoption, rather than toward 
the developers of the program. Whereas understanding the reasons for a program's success or 
tailuni is the goal of formative evaluation, . . understanding 1$ not our only goal In evaluation. 
We are also interested in questipns of support, encouragement, adoption, reward, refinement, 
etc: And these extremely Importfnt questions can be given a gaeful, though in some cases not a 
coMpl^te, answer by the merf dlscdvery of fuperlorlty.^'^^ 



SLATSR 




Th« prvmtw und^rtyihg (««tintt lor puipos« o( instrucUonai (nanuQUtnonl ufid docision 
making Is tht notion that group-bMad instruction within a fixad curriculum using Invariant 
teaching strategics doas not anable each studant to raach his or hbr. highest level of learning." 
Rather than-servlng the purpose of sorting students relative to the r peers, student' testing Is 
Increasingly being used to.deaign and redesign each individual's ihstruction tp promote mastery 
of the learning task. Instructional management, conceived in this way. requires the integration of 
taating and Instruction. In which the teacheris provided with precise djjecriptions of each 
sludant's learning as a guide to modifying the instruction. ^ 

Testing for student certification refers to V^e practice of confarring institutional rewards (e.g., 
diplomas, documents certifying competei^fce. advanced placemerjt in a course sequence) on the 
basis of test performance. This use of test data is gaining considerable attention as a result of 
minimum competency testing Iprograms enacted" in several states! and local school districts as 
well as the "early a|it" examinations administered in California apd Florida high schools. The 
rationale behlod teftlrig for student certification is that "seat time" Is iritdequate as a proxy for 
student compatance and, therefore, more objective evidence of student achievement la necessary 
to restore meaning to the diploma. ' 

- How is the selection ot a performance testing approach related to the testing purposes 
discussed above*? First, one can arg»e t4iat both forrfiative program evaluation and instructional 
management require student performance data not just on achievement of terminal course 
objectives but also on "enabling" object|yes-skills that constitute" neceasacy but not sufflcleht 
condltioris tor success in achieving ultimate cpuriego^ls. The intent is to identify points at v 
which the Instructional program is faltering, either across all students (in th<i case of fornjatlVe 
evaluation)- or with respebt to an individual studertt's learning (In the case of instructionar. 
mahagemant). - , 

' , \ '. ■ > • ' . - 

Testing for achievement of enabling objectives implies the need |or procesf measures aa 
oppoaad tp product evaluii^lon„ although there will tend to be eycjsptlona to this rule. For 
example, at thi end of an auto m*tffanl6a course, students mlghfibe exjaected^o trpubleahoot a . 
apeclfled set of mechanical defects. The enabling kkllls woOld Includip knowledga of basic angina , 
principles and functions, knbwledge of the interrelations among Engine components^3bll)ty to 
Interpret Information about an operflrfing engine, ability progre8sl\)ely to narrow dOwn,the| moat . 
' kely problam?, and proficiency Irr integrating multiple types of IrtforrYfiitlon, A testing approach 

lat would Imdicate studant daflci^ncle^ in such enabllnig skills mlgh.t InClB^a a sat of work 
•jmplM. scored from a procaaa ev^lua^lon ;)erspectlv#, suftplamer ted with pppet-and-pencll test 
Jtemi rrvkiauHno knowledge of baalc facta and prlnclplaa.* By comjbrahaniivaly taating the 
enrouta cdjrse objactlVaa, the Instructor can avoid wasting time reteaching akllls already learned 
or neglecting to' teach essential skills not mastered by one or more students. 

Summatlve prdgram evaluation and student certification, on the otfier hand, both call for a 
product avaiuttlon approach vvhen feaalWa.*? This posltlorj.ls taken for the following reasons. 
First, the dtclilon mtWsr Is. Interested in knowing whether students adhleved the objectives 
Stated ai 4nd-of-courta outcoit^es: knowing why students faHad to meet these performance 
standards la ot laaaar'lmpoftlncar Second, performance testing Is an expenalve undertaking 
under any clrcumst«ncea: by focuaing atudent evaluation only on tennlnal objectives and scoring 
performance from a product perspective, valuable resdf&rCes are made available to do a better v 
overall job of taatlhg. Third, product evaluation tends to yield more reliable scores than those 
made on the baais of fleeting obaervationa. Otten^ a task results In a durable product that can be 
judged by multiple evaluitora or described In objective terms. For example, the ability to grind a 
machine part to a presclrj|»ed tolerance can be objectively (and reliably) scored, whereas the 
psycrfomotor skills leading to that priDduct are mora sutJjectlvely judged". 



10 



20 



\ 

INtJRODUCTION 



The latter point— reliability of scores— relates to another type of interaction between testing 
purpose and the choice of a measurement approach. The key question is how crucial— and 
irreversible is a particular use of tost data In tho case of student cortifiqatlon, tho answer is 
obvious: Decisions made for this purpose affect individual stui^jf^nts in important, relatively 
permanent ways Any test data supporting these decisions must be highly valid and reliable The 
remaining three testing purposes are most likely ranked on this dimension in the following order: 
summative program evaluation (because program continuance/termination is a major, often 
irreversible decision), formative program evaluation, and instructional management (ranked last 
because diagnostic information about a given student is typically supplemented with other types 
of data and the effects of inaccurate test data are relatively impermanent) 

Since crucial and irreversible decisions based on test data demand evidence x)f high validity 
and reliability, how do these requirements af^fect the selection of a testing approach? With 
respect to validity, whatever testing approach is used should measure the skills it claims to 
r measure at the level of complexity and sophistication at which they are taught— and learned, For 
example, to certify student competence in a computer ph3gramming course, a test shoulct 
determine whether students can actually write and debug a program at a given level of 
complexly. I^ultiple-choice items or other types of respondent measures (if these' constitute the 
entire certification exam) are not Ifkely to reflect the intended course outcomes irjjheir entirety. 
Beyond specifying a testing approach possessing face validity and content validity as a measure 
of terminal course 'objectives, it becorpes art empirical question as to which type of performance* 
test is the most valid predictor of competence as a computer programmer. Direct assessment, 
work samples, and simulation all would seem to hold no a priori advantages over one another in 
terms of predictive validity. It is largely the care with which a performance test is constructed 
and administered that determines its predictive validity. Jhe reader interested i(i specific test 
development steps that are necessary in creating valid performance tests is encouraged to 
consult Klein's chapter in this volume. . • 

With respect to reliability, pradiKt evaluation tends to prqduce more consistent spores 
across multiple raters than those obtaiped through process evaluation approaches. Standards for 
judging products or tangible outcomes tend to be more objective; hence, such judgme'nts should 
be mdre reliable. On logical grounds, it is simply more difficult to specify the appropriate steps 
leading to a given^product than It is to specify the-^sired characteristics of the prodoct^or v 
outcpme itself. If examinees can take'a vdr^ely of Routes in completing a task it is presumptuous 
in rriany cases to argue that one procedure, is inherently superior to the others, 

In selecting between opei'ant and respondent measures, the issue of reliabilty presents the 
test user with a perplexing problem. Multiple-choice tests of respectable length (e.g., twenty 
items) routinely yielcj reliability coefficients in the .80 to .90 range. Users of performance tests in 
which behavior is observed iind rated Hiy two judges are very pleased when the interrater 
reliability coefficient exceeds .60. Faced with a choice between a highly rpliable objective test 
and a moderately reliable performance test, what is the tfest user to do? Go with the reliable but 
less-than-valid respondent measure— or opt for tbe converse psychometric configuration of a 
direct assessment or work sample technique? Both indices of test quality need to be weighed 
carefully when important decisions are going -to be affected by Jest results. This author wpuld 
argue for using the more valid measure ind t|ien faking all possible steps to boost reliability. 
HoweVer, this is an oversimplified response to a complex dilemma. 

; ' , 

Technical Considerations. The psychometric properties of validity, reliability and objectivity 
that pertain* to any measure of human behavior can be used as a framework for analyzing the 
advantages andjimitatiorts of performance tests. The present s^tion extends the foregoing 



diteuMlon by focuilng on tht n|l«llve strengths of direct Ussessment. work """P 

s muletlon techniques In- these respects. Suggestions are also offered for increasing the validity 

IbilH^^ 2^^^ testi^^ complete treatment of the Pfy^^^r'^f "^l':^^^^^ 

related to these performance. testing approaches-as well as a review of pertinent erpP rical 
Sud^s of reliability and validity^is beyond the scope of this chapter^ The reader .s referred to 
the chapters by Perloff and Klein- in this Handbook for more thorough discussion of these 



technical issues. 



Real life situations are difficult to control with sufficient precision to ensure of 
condition* across severiil examinees. Thus, as the performance testing technique «PP«>«^hes the 
realty of the criterion behavior it is Intended to predict, standardization of he stimulus and 
surrounding conditions* becomes more difficult to achieve. Fitzpatrick and Morrison, in 
aummariiing their analysis^of the reliability .an(? validity of performance tests state: 

the more closely one tries to simulate a real criterion eituatibn, the less 'reliable 
will be one 's measurement of the perfprmance. The dilemma of simulation is that 
• increasing fidelity and comprehensiveness appear at least in a general way to be 
associated, on the one hand, with increasing validity but. on the other hand, with 
decreasing control and thus reliability 

' ' As these authors pdint out, if performance tea.ts are based on a sample of real life 

oerforrnahce (I e . in direct assesstnent) that sarnple "must be taken under cqn.dition8 

SJrSve the .timuli and responses that occur In real life - 

fTom ocbasion to pccasion,-it is desirable to measure performance a number of times under a 
vyide variety of conditipnd. f 

wiv .trMH A tmt alvin (t midday would prewnt the »xamlnoe with a somewhat dlllorant «at oi 
r«S dlff.«nt riaponaei-than on. conductad during ru.h hour. For .xampla, 
Si of W' mXtema l«^^ with the volume ol tralflc. but city eongeatlon may preclude 
obSTr^ng an eS™. adlSTnce to .peed limits Obae^atlon o. th. .am. driver under varying 
driving condition, would tend to yield different proficiency eetlmatea. . 

Yh. oroblem of W etandardliatlon In direct awejameht not only affect, reliability (a. 
«i,m^X^^^Z^)M al.o Inf luence, validity of the meaaure. Tha»l.. If the Intent 
Mo ^ih.™Jterromt sampli of behavior taken In an actual work matting to pertomtanoe In th. 
. !lrSirrm.y"of Xant t..k.. evidence of the .ample', repreeentatlvene.. I. neceewry. 

Work aamole and .Imulation twshnlqu.. directly addre... the l.»u'e of .tandardlilhg teat 
atlmrind .urroundlrig sdndltlon. by controlling the extraneou. factor, that might Influence 
S™n« McQufr. noti. th.M advantig,. of almulatlon (In the context of;a.«...lng 
CTmpalOToe of Ijealth profeMlonala) In the following way.: t 

Prarfeterm/natfon and we»e/.e«on of tht ittk. It I. olivlou. that .ImMlatlon make. It po..lble 
to^JS mtS p^wly the .«ct tMk which /.xamlftee. are to be r^u r«) o perf^^m. 
Furthw It la claw that, In contraat with th. "nol.." alwaya prewnt In rMllty, .Imulation 
™kSlV~M"w^^to f^u. on the .lament, of primary concern In a twting .Ituatlon and to 

. Stihd»rdlntk>n Of ttf W Just at a given student can be repwttdly tionftonted with the 
•a me task, simulation enables in examining body to standardize the task for all examinees . 



INTRODUCTION 



In short, all examlnMit, dan bo given exactly the same problem to copd with "and this <mn be 
acpomplished without ah attack on nature. 

lmpr(>vBd i9ft}pHng of fy»rformanc». By »tandflrdl2ing the task and focusing on the most 
signflc&nt aspedls in each It is possible In a given time period to samii^e an Individual's 
performance with respect to a much broader and more representative group of problems . . . 
which reality can rarely provide In a reasonable time frame. Inr carefully developed 
simulations, aoy problem, ranging frohn the most urgent emergency to illness spanning ' 
many years, can be" collapsed Into a-half-hour exercise and sumrfuoned on demand." 

A second technical issue In performance tests that depend on observation of behavior is tfye 
problem of controlling bias In impressionistic judgments. Jdlosyncratic/rater biases such as 
leniency/stringency errors, the halo effect, and unwillingness to render extreme judgments are 
probl«ma that lower rellabltlty and cast dbubt on the validity of measurement. These errors are 
the res^ult of many factors that bol| down to "the rater's willingness to rate honestly and 
conscientiously. In accordance with the Instructions given'tofhim . . . and factors that limit his 
ability t6 rate consistently and correctly, even with the best intensions."" For exanipie, the rater ' 
may Identify with the person being observed! resulting In an overly generous rating. This effect Is 
. particularly troublesomt when this rater Is that person's trainer or supervisor, who Would prefer . 
to bias the rating rather' than risk reducing morale in the organization. Factors limiting raters' 
ability to rate accurately Include lack of opportunity to observe, the covertnes«,of the trait being 
rAted (e.g., self-sufficiency), ambiguity of the quality to be observed (e.g., supervisory at^Jlity), 
lack of a uniform standard of reWence on the rating scale, and specific rater biases and 
idiosyncrasies.^ , ' 

These types of rating problems apply equally to direct assessment, work samples, and 
simulation techniques In which behavioral observation Is the source of test data. In general, 
when objectlve^ performance standards are available (as In some types of product evaluation 
methods or process evaluation check Wsts), these problems are not prbnounced.. However, many 
perfbrmance testing approaches rely on impressionistic judgments that can lead to measurement 
error. 

• • ■ 

On^of the most promising techniques for overcoming such types of rating error ijbt|e use of 
behaviorally anchored gating scales.^^ Rather than using such globial scale anchdra as su|Mrior, 
^ average, good, and so on, behaviorally anchored rating scales define scale points With , 
unambiguous descriptions of obseivable behavior. By providing a clear definition of that trait 
being r^ted and a more objective frame of reference for judging individu)Bils on that trait, 
behaviorally anchored rating scales limit raters' tendencies to subconsciously bias scores in the , 
ways mentioned above. For further discussion of rating errors and strategics, for attenuating 
them, Jhe reader is referred tp the chapter by Perloff in this volume. 

Cost Consid9ratlons. Costs' jn developing, adtinlhlstering, and scoring performance tests 
constitute the greatest obstacle to their use In education. The technical problems discussed 
above are 'surmountable; expense- in making gObd use of performance testing approabhes is 
^ more difficult to avoid. ^' . x 

Conducting a post analysis of various testing approaches Is a tricky business. First, the test 
user must have clearly in mind the behavior to be measured and the appropriateness of : 
alt«rnatlve testing techniques In providing these n)eas\^es. In some cases, certain testing 
' alternatives will be rule^ o^t ft this point, regardless of cost. However, In many ihstances, two ot% 
more types of performance tests wjll be feasible and appropriate. At this point, the hypothesized 



SLATBR : I 

mwflihtl gain in Villdltyjnd other desired attftiwles must be wetghed againat margifiul cost 
dlSerencea. Without erWrlcal evidence concerning the validity and reliability of the alternatives 
upder conslderjitlon. as well as accurate cost data, the deqltlon will necessarily be somewhat 



subjective. 



Usually the purpose of testlnfi will guide decisions regarding the amount of resources 
devoted to test development and administration. For exan^eJe. student certi'icatlon arid 
summative program evaluation generally demand higher standards of validity and reliability, that 
are translated Into higher coats. When serious decisions are at stake, more effort must be 
devoted to teat development activities such as (1 ) validattng the test content against task? 
performed In real world settings, that is. conducting job analyses and matching tested skil s with 
essential Skills Identified empirically; (2)' carefully specifying performance criteria, instructions tq 
students, and guidelines for examiners to control for various types of measuremep error: and (3) 
"conducting reliability and predictive validity studies based on pilot test administration. Greater 
tesradmlnWratlon costs are Warranted in the areas of (1) increasing the number of samples of 
behavior obtained for a given examinee: (2) increasing the number of raters to ^oritrol cerlain 
rating errors and enhance reliability: and- (3) investing greater resources In the use oT full sca^e 
equipment required under direct assessment or work sample approaches. 

Testing for-the purposes of instructional management or-in some cases-formative program 
evaluation generally would allow relaxing the above standards. More informal procedures Jn test 
development and administration do not necessarily obviate the advantages of performance tests 
over respondent teats. One could argue that many instruo<ronal activities occurring in the 
classroom are variants of performance tests. Students routinely turn in projects or P«;form tasks 
that ar« in easence performance tests.-The instructor's time spent In devising and grading these 
assJanments is traditionally viewed as an investment in instruction, rather than an added testing 
burderu Granted, the more sophisticated simulation techniques and work sample methods 
require more effort to design, but the payoffs in student learning adequately compensate for the 
added expense. 

An Interesting aspect of the cost In perforrriance testing is the issue^Uat security Test 
developers who market standardized achievement tests are chafing under lew laws that require 
the release of test Items to the public. This results in a greater expense In developing, norming 
and atatlstlcfclly ^uatlng new item^ for nationally administered teats. Test security is not a major 
• liaue in many typea and uaes of performance tests. Irrespective of whethei the exartilriee knowa 
, the contend of a work sample, he still must perform the task at a certatn level of proflo ency. In 
other worda. It la hard to cheaf^when the task is to solder electrical conne< tlons or © type a 
letter. The exception to this rule occurs when knowledge of specific test cbntent is likely to give 
the exartilnee an unfair advantage. ' | 

SumrDiiry 

Thia chapter hn fought: (1 ) to Identify the essential dimensions on^hlch performance tests 
differ frftm traditional acjidemic achievement tests. and in so doing propose a corweptual 
definition of performance testing. (2) to develop a three-part typology of pfff ormance t«»t^no 
apWc^h^ inuftrated with fpeclfjfc examples, and (3) to examine Ifsues affecting he advantages 
■ and llmltatlona of performance testa. The unatated Intent of c^*Pt«![ ^" oXZce 
rattoniiHty in teat ute. As ahbuld be apparent, we have a great deal to Idem abou P^PJi^J^^ce 
iSaWin vb^^^ edueitlon ar^d In other educational fields as well. The hope Is that this 
. ^htpterarid thoae that follow will advance that understanding. 



« 



INTRODUCTION 



Notts / 

'Lee J. Cronbach. Essentials'of Psychological Testing, 3d eS. (New York; Harper and Row, 1971). 
p. 26. ^ " ^ 

'David C. McClelland. "T^stin^ for Competence Rather than'for Intelligence' ."/4me//can 
Psyc/JO/oy(Sf 28 (1973);^p. 1-14. 

>Paul S. Pottinger, "Compe^enoe Testing^ as a Basis for Licensing: Problems and Prospects." 
Paper/presented at the Conference on,Credentialism. University of California at Berkeley. 1977, 

^Christine H. McQuire. "Simulatiort as an Evaluation Technique." Paper presented at the 1976 
Annual Invitational Conference of ?he National Board of Medical Examiners, Philadelphia, 1976. ^ 

'The driving test is, to some extent, manipulated by the examiner through the directions given to 
the examine (e.g., "niake a left turn here." "park the car in that space"). However, it is classified 
sli direct assessment because the flow of events the driver must respond to is not artifically 
structured. 

"H.Q. SeashQre and G.K. Bennett, A Test of Stenography: Some Preliminary Results." Parsonnel 
Psychology 1 (194a): pp. 197-209. 

'Jack L. Maatsch and Michael J. Gordon, "Assessment Through Simulation." In Evaluating . 
Clinical Compatanca in tha Health Professions, ed. by Margaret K. Morgan and David M. Irby, 
(Saint Louis. MO: C.V. Mosby. 1978), p. 123. * 

•A.W. Pennington. "History and Classification of War Games." In First War Gaming Symposium 
Procaadlngs, ed. by J. Overholt (Washington, DO. Washington Operations Research Council, 
1961). 

'Christine H. McQuire, Lawrence M. Soldmon, and Phillip G. Bashook, Construction and Use of 
Written Simulations (New York: Psychological Corporation, 1976). 

.'"Office of Strategic Services Assessment Staff. Assessment of Men. (New York: Holt, Rinehart & 
Winston, 1948). • ' . • ' 

"Donald W. MacKinnon, How Assessment Canters Qot Started in tha Unitad Statp: The OSS 
Assassment Program. (Pittsburgh, PA: Development Dimensions International, 1974). 

"Ibid. ^ 

• * . 

''D.W. Bray and D.L. Grant, "The Assessment Center In the Measurement of Potential for 
Business Management," Pay^fiplogictif Monographs 80 (17, Whole No. 625, 1966). D.W. Bray, R.J. 
Campbel), and D.L. Grant, Forrnativa Years in Business: A Long-Term Study df Managerial Livas 
(New York: Wiley & Sons, 1974), 



SLATen : : 1 ^--^ 

«A« described WNormtnFrederlksen/ An t"-f>*«»<<»**««P»^ ':^^^^^ , ♦ 

.hu^S" .! IndeTto ..mulate cert.^ aspects of thi Job " 
the letters memorenda. records of in-c9min9 telephone calls, and other materia s that tney 
supS^^ in the in-basket of an Jdmlnlstatlve officer, the examinee is given 

eoSa e office materials, such as memo pads, letterheads, paper clips, and panels. He 's tojd 
Kr .' heruS Of the admini^trative lob and that he is to respond ihe mater^ s in ^is 
n-baaket ai^ thouah he'were actually on the job. by .writing letters and memoranda, preparing 
aaen^a for mSa writing notes or reminders to himself or anything else that he deems 
rproprLte^Q^^^^^^^^ Tn R Rtzpatrick and ^.J. Morrison. •Performance ^nd Product EvaK.a«on. 
Tn ffl/ona/ Measurement. 2d ed. by Robert L, Thorndlke. (Washington. DC: American Council 
on Education. 1971). pp. 243-44, 

'•McQulre. "Qimulation as an.Evaluatlon mhnlque ' ^ 

'•These are not exhaustive of all testing purposes that might be Identified, owing to the 
open^nded nature of the second question. However, they represent four comrnonly ««POU»a<J 
f^rutrrhich for the sake of he present discussion, are sufficiehtly comprehensive A more 
comp eTeTr^tmin of testing purposes in education Is found In Quicfelines for f.a/uaf Basic 
Sk7s and Life Skills Tests (Portfand. OR: aearinghouse for Applied- Performance Testing. 
Northwest Regional Educational Laboratory. 1979). 

-Lee J. Cronbach. 'Waluation tor Course Improvement." TeacA,ers Co//ege Record 64 (1963): pp. 
672-83. 

-Mighael Sctiven. "The ly4ethodology of Evaluation.' in AERA Monograph Series^on Curriculurr^ 
fva/uaf/on, ed. by RobertUfc. Stake. (Skokie. IL: Rand-McNally & Company. 1966). _ 

..Benjamin S, Bloom. J. Thomas Hastings, and George F. ^^T:,!;'^''' '''^ 
Summative Evaluation of Stucfeht Learning (New York: McGraw-Hill. 1971). 

«in some cases the process is the product; thus, this discussion centers on cases in which an 
idenZble oT^m^ from student performance. In those cases when only Process 

mewu^^^^^^ students' behavior can be scored holistteally (i.e.. an overall judgment Is 

made)< * 

'ifexceptlona to this rule are not difficult to find. For certain taaka^eas •vtHi«tlon crit^arla^ 
hiahk d^ewftle and Judgments are based on these criteria can|prHghly consistent. Efficiency 
IS one exlTmT Taking the case of the computer programmer's ifilficatibn 
Ident wh^ prepares a flow chart befow writing the actual program shpuia probaWy be judged 
morreffrclennhan the student who writes the same program by trial and error methods. 
XlrNlTx^iflre debuflfllng procedures. The end result may be the same in both cases, but . 
the former student arrived at It more economically. . 

"Fitrpatrick and Morrison. "Perf^mance and Product Evaluation." p> 240. 



»ll>ld. ^ 



?<McQulre, "SImulatlon.as an Evaluation Technique," pp. 11-12. 

Robert L Thorndlke' and Elizabeth Hagen, Measuromenf and Evaluation In Psychology and 
EduciZn, 3d ed. (New York: John Wiley & Sons.-1969). Emphasis In original. 



2e 



INTRODUCTION 



»*lbld. ' , :i' 

*'P.C. Smith and L.M. Kendall, "Retranslatlon of Expecfatlons: An Approach to the Construction 
of Unambiguous Anchors for Rating Scales, " Journal ot Applied Psychology 47 (1963)1049-55 



CHAPTER TWO 



.•9 



PHILOSOPHICAL ISSUES 



^* * 

R^SfnltUipf fftt fyp« of ft«r/np prognm ufd In • voctttionti 9duc»tlon progrm, th^rtv . 
•f v*n/ phil09qphloiil /nutn which und9rgftd f/i# 99l9otloh and ImpltnwnfUon of « f»»f/ng 
prpgnim. HQr pnampC ff^ may b9 u99dito m—sur^ $tud§nt 9chi9vm9nt, faohT prformtnc; 
or th§ p9rf0rm9nc9 of • program ar— or aohool dlatrlct Each of thaaa raaaona for tasting has a 
sarha of phlloaophhal l$auaa aaaociatad with It. Chaptar Two dfacusaas soma Of tha 
pMlpaophical iaauaa facing v6catlonal aducatora who uaa parformanca tasting. . . 

< 

HitrfrfBorow bagina tha chaptar with a diacuaaion of tha tacit aaaumptiona of tasting. Ha than 
:^tum9 Hia*ati$rition to ivoh cohcarna. as phblams of validity, damocraHc idaala, national prioritias, 
adiiCBUqnaf payoff, tha rniaaion of ifchoola, vocational training, opan admissions, and bahavioral 
pbl0ctlv9a,andtftaral$tlohshipofaach : 



r/i# anond papar raviawa savaraf concarhs raisad by parfotmnca tasting. In raising these 
cOfitiarrfa, Jaok C^. ^lara viawa partorn^anca tasting as bringing "to tha for^ tha biting theoretical 
i§fy9a IC^JTI'^. mfflffta. pl$guing'^ifcation and our trader aociaty today." Ha cautions that 
" " Mtif!gJf^BMIf(M»t9 ii$aa within tha danger axiats that 




f/»f»# 'V/ifW/^if^iyi M ba jfxaaiadaH wiljari It is called upohto provide mora than It has to offer, 
the Chaptar anda with a disGuasfori of these, two papers ay Jphn F. Thompson. 



^8 



PHILOSOPHICAL 
ISSUES 



Ptffformanot Testing and Social Rttponalblllty: 
kin latuat Analytla ^* ^ 

, Henry-Borow - 
University of Minnesota / 
Minneapolis, Minnesota ( 

Tacit Asaumptiona of Teating 

The use of tests to clalssify students, appraise their learning potential; and certify 4hem for 
diplomas or occupational competence is premised upon a number of beliefs about human 
behavior and examlnatlcfh scores which are rarely made explicit. The first of these is that people 
differ frofnpne another in any specifiable trait and that such trait differences can be fhown to 
distribute themselves along a calibrated continuum. The second assumption addresses the 
stability of measured trait differences. The notion that an examinee will fluctuate capriciously in 
■ intelligence, (mathematical aptitude, space perceptipn, or bimanual dexterity Is offensive to the 
'^test user since such i:hameleon-llke propensities make it.imppssible to render a trustworthy 
characterization of the individual's psychological strengths and weaknesses. It should be. noted 
that this built-in assumption about trait stability extends beyond the question of the atatlstioai 
reliability of the testing instrument per se, v^hich Pertoff discusses, and 14 a quality With which 
test theorists customarily imbue the tesf subject himself. 

Thirdly, most current tests, particularly pcper-and-pencil tests, are premised on the belief 
that, by combining subject responses tQ. a series of discrete items in additive fashloti, we may 
obtain a composite irtdicator of the internal traU which is being assessed. While the logic of such 
an infarenpa has not joften beeh questioned by test theorists and test users, applied 
psychologists schooled in the Qestait psychology tradition of Kohler and Koffka have argued 
that the essential wholeness of a tr^iit is missed by aggregating small fragments of behavior. Lay 
critics of testing, who tend to view any human trait as an entity, as Ding an alch, share this ' 
skepticism. 

*■ « 
A fourth assumption speaki to the practical import of measured trait clifferences. that Is,' our ' 
ability to make a prol^abllistic Ij^tement about theW^ent's performance level in some Contest 
setting (for example/isn advancsHd training program or a particular occupation) on the basis of < 
-his test scores/it is not the student's stahding on the test we really wish to know but, rather, 
what that standing can tell about how the student is likely to perform in some training or work 
for which he or she is being considered. Regrettably, scores on educational achievement And 
performance. tasts are commonly viewed as definitive indices of the tMhavior we truly wish to 
kriow 'H^*^^'* ^^""^ assumptton unverified is to bypase the obligation of test validation. 



The Problem of Validity 

The current controversy affecting all ability testing, including performance testing, centers 
on the meaning and trustworthiness of test scores aiul th« manner m which thoy nro used m 
msltZl decision maK.ng The prociso nunntity of .cho.nstic and ^^^-^^^^^^^^^^^^ 
rfnnually administered in the United States is not known, but it is commonly agreed that they 
m.mt^r in the millions. Assessment of the ef fee five r^ess of educational programs and personne 
Seds^ns which significantly affect the careers and welfare of students and prospoctiye workers 
Ts conXuy made on the basis of test results. In public forums and in the courts. Insisten 
questions a?e asked' about the practice of denying admission to training programs or of failing to 
certify candidates for job eligibility on the basis of low test scores Are tests « 
Suitable indicators of the individual g€>mpetenciGS we wish to know about? This is he v^^.dity 
^estlon a complex ls*ue which is variously treated m this handbook by Slater. Perloff. and 
Klein. 

. Long-standing and deeply rooted assumptions about the intrinsic merits of academic training 
have made systematic inquiries about the validity of achievement tests as indicators of 
sub^quent nonschool performance appear irrelevant. If scholastic experience, mcluding 
vocational and technical education, is of value in and of itself, then the validity of any 
Ich?evement test can be ^leflned as a function of the correspondence between the contents of 
Ihe tesnnd the aims and contents of the course or curriculum it is designed to reflect. The - 

■ empirical question of what educational achievement test scores can accuratj^y «bout 
students' extra-scholastic or future job performance has not often been confronted. The 
Dmdictive valfditles of CEEB and' ACT scores have, of Course, been frequently, examined against 
cdleg^ gradr how many studies carefully document the qMantltatlve relationship between 
scores on such tests, or on perfprmance tests, and consequent career behavior? 

Cronbach identifies four types of test validity-predictive, concurrent, content, and construct 
validity.' Perloffs chapter, which presents a somewhat similar classification 8<=^«;^«:^*'/°PX^« 
technique labeled "consistency validity as an Improvement over the classica predictive validity 
approach However, it is predictive validity (called '•criterion validity" in l^erloffs terminology) 
wWch has commanded major attention from test researchers" since the earliest decades of their 
celry "he construction and usex^f intelligence, scholastic aptitude, and vocational aptitude 
tests have typically rested upon the rationale of predictive validation. 

A Similar record of vigorous validation work cannot be claimed for the field of performance 
testina With the exception of the military. the U.S. Army Air Force aviation psyctiology research 
proaram for example, there have been few studies on the predictive validity of performance 
tests, particularly where subsequent job behavior has been used as the criterion. 9«"«;;« • 
performance tests in vocational education may be said to have a high degree of content va idity. 
V Their contents seem closely matched to .the specific aims and subject matter of the curriculum. 
Furthermore, performance tests in vocaflonal education which take the form of work samples or 
lob simulations, especially where mechanical, electromechanical, or electronic testing devices 
are involved! possew an impressive amoorlf of so-called fate validity^ That is. they look stri ingly 
similar to the actual on-the-job task to be performed by the worker. Early developers of industrial 
personne^ tests called this characteristic of tests "verisimilitude." 

Performance tests which have high face validity or verisimilitude are so co^pellingly 
convincing in appearance that vocational educators, on-the-job training supervisors, and 
Industrial recruitment officers are tempted to accept scores derived from such performance tests 
as taritamount to job proficiency. In fact, in so-called competency-based instructional programs, 



22 



30 



PHILOSOPHICAL 

ISSUES 



slucJ^fits scores jon these tests may serve as the critical arbiter of successful program 
completion. And yet. If critehon-raf&renced training and testing are strictly assumed to imply tho 
existence o' an external standarjl^f performance against which test behavior may be compared, 
'then tho pTOdictive validity of tho majority of curront performnncfo tests remains unknown 

The failure of the typical performance test to tap relevant factors in on-the-job trAinmc) 
behavior or^bona fide job behavior may limit its capacity to furnish a comprehensive ^anrf 
accurate index of the student's competency Performance tests customarily appraise ah array of 
cognitive and psychomotor skills. Yet, the affective domain is clearly part of on-the-job 
performance Successful performance in the vast majority of occupations rests at least partially . 
on worker attitudes and personal disposition, sjjch as pride of workmanship, complianco with 
rules of the workplace, quality of personal relations, deoendability. and integrity A sumrr^ary 
published in the I950's of over 300 stiidies ofVorker fallyres revealed that in the rtiajority of 
dismissals, transfers, or nonpromotions due to unsatisfacV^n^ work records, factors of 
inappropriate personality and character, inclu.ding attitudes ara ethics, were involved. 

How might we attack this validity problem? The technique ci construct validity offers a 
promising approach. Let us suppose that a student who has coriijjleted a welding cburse and 
done well on his terminal performance test lat^r proves unsatisfactory as a worker because he 
cbafes under supervision and is described by the shop foreman as an uncooperative employee 
who does not follow instructions/or adjust to changing job routines. Suppose further that a test 
of job adaptability has been constructed to measure such noncognitive or personality variables 
as cooperativeness and flexibility. Let us now hypothesize that a seieiUec^-Wfnple of trainees, all 
t of whom have successfully completed the welding course (and pas^d the peKformance test) bi/t 
who have spored low on the job'adaptability test, will subsequently be low-rated on the actual 
job. If correctionalllndings (adaptability test scores vs. supervisory ratings with the welding 
performance test scores held constant) confirms our hypothesis, we may conclude that the 
adaptability test (measuring personal adjustment to the job) has construct validity, signifying that 
job adaptability is a contributing factor in success On a vyelding job. More importantly, we have 
produced a demonstrably more accurate indicator of student performance by combining 
information from the cognitive and affective testing instruments. ' , 

Performance Testing and the Democratic Ideal 

Neither coincidence nor advances in the technology of psychological mea$urement alone 
can account adequately for.the rapid ascendancy of educational testing. One must look beyond 
the schools and understand the changes in American social philosophy wrought by rapid rates of 
industrial expansion, urbanization, occupational diversification, and increased geographic 
mobility. The traditional social and familial patterns of an earlier era which stressed class 
distinctions, restricted occupational selection, and movement across social class lines have 
weakened perceptibly. Privileged occupational inheritance and the deliberate training of the 
youth of select families for' continuity of leadership and power was gradually replaced by a way 
of life which favored economic growth and productivity as national aims. Thus, the ability of the 
individual to contribute to a burgeoning economy through demonstrated skill took on new 
impprtance in the social selection process. Beginning about 1900. formal education increasingly 
gaine^ status with early job experience and then surpassed the latter as a mechanism by which 
youth sought \o qualify for socioeconomic advancement. v 

Special, training curricula, legislation mandating eligitjility requirements for occupational 
erftry, and Gtwnpetitive examinations became the modus operandi by which the young were 
prepared and sdrt^d for access to the worid of work.. ' • ' 



23 



Durino thli period Jh« ^jdvocates of unrMtrlcted oro^*^ floods and services In a 
free-marktt economy Uad seen promotion of ihe national good ss the best wsy \o Insure the 
welfare of the Individual. What was ^ood for America was supposed to be good for 
Americans— all Americans. But the transition to a;T)ore dynamic Industrial society— less 
classbound and rewarding individual productlvity-did not culminate in the attainment of the 
Ideal democracy that some had envisioned. We learned as a nation during the lusty social reform 
movement of the tum-of-tht-century era. again during the Great Depression of the 1930'8. and 
more recently during fhe widespread turbulence and unrest of the late 1960s and early 1970s, 
thst the meritocratic system, by which those Judged best Qualified to productively serve the 
nation's growth needs afe recognized arid rewarded. Is jjra^ly flawed, Equatfily of educational 
and occupational opportunity for all citizens remains a yet unattalned goal, and the advancement 
of the human cdndltlon has not always kept pace with economic progres?. Ironically, the same 
educatlonar system which appeared to provide a vehicle for soctoedonomlc Improvement came to 
be seen by many among the disadvantaged as h barrier to personal advancement. Educational 
policy In general, and minimal competency testing policy In particular", are now Inextricably " 
caught Up In this national dljemma. Some of the unresolved Issues attendaht upon this dllemm* . 
are briefly identlfledjater In the chapter 



NaUonal Priorniw and Individual Walfara 

if may be Instructive to view this controversy as a conflict between the goals of optimum 
manpower utilization, with gross national product aa the primary criterion of the nation's health, 
and the quite different objective of maximizing human potentialities. One seeks a rapid economic 
growth rate, high employment, and high levels of prpductlve and consumption. The other Implies 
a bottbm-llne belief In the virtue of human uniqueness and Its cultivation through liberal 
education. Aa we have seen, the conditions which favor the achievement of either of these goala 
' are not ntdesaarlly facultative ^f the other. The market for college graduates provides an 
HluatratlQn. By the end of *e I9«0e, college students were confronting ahrlnklrjfl opportunltlea to 
enter many higher-level occupational fields for which, a few years earlier, they had been 
encouraged to prepare. Inevitably, educational program admlaslona policies and testing and 
certification practices will reflect the Impact of auih changing empldyment supply-and -demand 
ratios. Just as surely, the question of "For whose gooci— for the nation or for the Individual?" 
myat again be ralaed with reference to the purposes of performance' teatlng. And predictably* - 
there will be no confident conserisus and no facile solutions. 



Education Payoff— lilt Worth thB Inv9^tm9nt? * 

Like 'other I natfutlons— government, business, and the military—formal eduAon haa 
witnessed a Ifasening of public confidence andLparafttent calla for proof of worth. There can be 
imie doubt <hat the current demand fpr accountability In education haa given performance 
testing tn^ competency-baaed programming an Increased measure of innportarice and urgency. 
Although education contlnuea to occupy a modestly favorable rank In the nation's scale of 
institutional vaiuea, public acceptanca^ii now leas an articia of faith and la more clearly 
dependent upon a Jbmonstrabie track record. The message seems to be; good education will be 
supported but Ineffwtoai educational programs will be trimmed or eliminated. Of particqjar^ 
concern tp some critlba are the eiaimad ecpnoiDilc benefits of vocational education, la the 
inyeattntfnt lnllix dP«ar| Juatlfladt^l thtgi |fck«t 'or the graduates of occupational training 
proorilivf8?'Dip|a the nation faca the lmmirli|nl proapect of atructurai unemployment, 
Underemployrhertt ah^ jOb;"spiilpvar" for tomorrow's legions of graduates? in one way or , 

' i ' . T fc ' 



24 



32 



PHILOSOPHICAL 
ISSUES 



^•noth«r. such qu«ttlont art InslttflHtiy posed or clearly Implied In federal and. state educational 
leolalatloo, thr Vocational Amendments of 1976. and in the charges given to the regional and 
topical research and development centers, Moreover, the rates of economic return on the sizable 
invesjiments In human capital which educational systems require are now t>eing. studied bV 
economists through cost-beriefit analysis. Thpmas. in making the case for applying cost- 
effeotlveftst crittrif to school programs, advocates studying "educational organizations as open 
systama which are linked to the total economy thrpugh a set of Inputs and outputs."' 

•* ' ' ' . ■ 
' Qnt fMponsa of the schools to the demands for apcountablllty has been to confront with < 
renewed vigor the Issue of qualMy control in 6ccupatlonal education. A three-pronged attack on 
the problem has been mounted: (l)'currlculum re-examination and reform, (2) improved 
techniques pl Instmctjon, and (3). Improved monitoring of the effectiveness of training. 
PtVformanca ttata can play a slonlflcarit role In all three of these approaches. It Is the last of 
these applications, hiowavtr, which appears moat open to poblic scrutiny and moat likely to 
attract the int«rtat ^^achobi boiiards, iegitlatofa, employers, and concerned citizens' groups. And 
it Is from these tame groupa that hard quaatlons are likely to come concering the purposes and 
trustworthineas not only of the educational system but also of the tests, including performance 
tests, which are used to appraise schools and students. 

• . "■ . 

.. 

, "1 

P9rform9nc9 T^sifng and the Mfaslon of the Schools^ 

♦ 

The vindication of educational "testing must rest ultimately upon the efficacy that 
measurement devices contribute to monitoring teaching and learning In the schools. All 
educational achievement teats, If ^they are at all relevant, reflect the undergirding phlloaophy. and 
alma of the schools. ' 

A vtal^or frpm another' plahet might deduce a great deal about the premlsti and a priori 
value networkjqf the conventional academic track Arnerican aebondary aohqol from a detailed 
atudy of its examination contenta. He/she would diacover that the typlofi sohool.*achlevemem 
testa emphasize mastery of verbal and quantitative systems of communication (linguistic and 
mathematical knowledge) artd comprehension of the terminology, facts, and principles of the 
major formal ^disciplines (natural sciences, social studies, and XM humanitiea). He/ahe would ' 
leuarn, further, that aoclety'ii ready acceptance of such masteries as thelndloatofa of, subsequent 
success and aoclally reaponaibie citizensljjp in the adult world resides leas on asolld baals of 
Empirical evidence and more upon a leap of faith.* ' ' » ' 

If our extraterrestrial visitor inquired Into our theories of learning, he/she would find that the 
choice of sy)>Ject matter In the traditional academic curriculum derives from the theory of 
gentrai transfer of training. This belief hplda that the diligent study of difficult subjects like Latin, 
phytlci. an^ mathtmatics diaofpllnea .an| thfrperta the mind in such a manner as to facilitate the 
late/ study of any other fiei<^ of knowledge. Early anjd broa(f*aoceptance of the validity of this 
theory, oauplad with truat ir^ the wisdom of professional education planners to know what la best, 
endowed conventional achii|vement teats with a special mystique and apparently Immunized * 
them agalihst serious ohaiier^ge to their authoritative status. 

It mutt bi noted that sortie close relationships have been reported over the years between 
superior pfrformanoe on aoh|evement 'testa and subcess irt higher education and In the 
profeMfoht and government fiervloe. How much of thia correapondence la attributable to a 
genuine pauael relationship ihd how much to aeiectlvA bias in favor o^ high-scoring applicants 
(self-lulfilling prophecy) cannot IM readily determined. Mounting skepticism has been voiced 



4 



^'^"^^ Wndtad lattr In tht chapter; r 



BOROW ^ ^ : — 

' about tha maanlng and ralavanca of conventional academic teat., much bf Jhls chaltenge coming 
from advociitet of ethnic minorltlet and of pdbr .handicapped, or non-English speaking children. 

*The general Charge has been that standard achlevemeht tests Ignore many socially useful skills 
and talents applicable outside the school and. as such, raise discriminatory barriers against the 
socioeconomic advancement o( the atypical student. Such tests, \\> Is clalmdd. are too narrowly 
acholastio and slight some of ^the Important pragmatic products of training which business and 
Industry look for. " ^ 

Well-designed and program-rslevant teats can correct such claimed limitations. These tests 
■ reledt the nownatlve or relative acore approach to test Interpretation, employing Instead some 
empirically established external standard which dan define satisfactory training atthlnmerit. This 
Is the strategy of criterion-referenced measurement and training prograncis which designate 
specific requirements for success or competency In a «klll..i.e,. specified levels of mastery, are 
said to be "competency baied." 

Historically the rationale underlying vocational education programs stsnrfa In stark contrast 
to that of the older academic curriculum. At the turn of this century, the secondary schools were 
tvDicallV elitist training centers for children of the privileged class. Vocational courses and 
/ curricula were rare. Those you^h destined to enter the labor force and the trades had to acquire 
their work akills on the Job. Large numbers of them were the targets of labor explol atlon. Many 
of the efforts of tKe soclsl reform movement of that period were directed toward mitigating the 
plight of this segment of ^e population. 

Despite the extension of compulsory school legislation to cover pider children, significant 
numbers of urbart teenageriT left school to find needed employment. Vocational educators 
pushed for occupational training opportunities in the secondary schools to counter massive 
dropouts and qualify young students for entry thto the lal^pr-force. One group which algnlflcantly 
. advanced tha vocational reform movement was tha Natlohil Society for the Promotion of . 
Irulustrlal'Eduohtion (NSPIE). It is noteworthy that tha NSPIE racognlxad the n<««P«'«fW« 
between affective programs of vocational education and career guidance aervicea, and it^ivas this 
organization which was Instrumental In airing the ^4ational Vocational Quidance Aaaociation, the 
first national society devoted exclMalvely to the advancement of guidance.' 

* Given this dlimata of practical urgency, the philoaophy of vocational aducation. the design of 
its curricula, and its approach to tha maMi/r^mant of ftudent aohlavafnant davalopad along 
boldly utilitarian linaa/ Tha Fpurth Yaarbbok of the American Vocational Association, which 
takaa the' philoaophy of vocational education as its thpme. projecta a atralghtforward and 
• unidlmanaional Image.' There la no detailed explication of tha value roots of vocational aduoatton 
nor of possible phllosophicslagraementa or quarrela with tha concerna of humaniatic ^ - 
Daychology^-salf-actuallzation. atudant-cantai'ed education, and the debilitating paychologica > 
affiiota of tlltnation. EndprsamanJ, howavar, la given to the importance of developing orjginality 
•nd thinklho ability and to tha primclpla of Individualized inatructlon to ao6ommodata wide 
diffarancaa In student baokgrpunds and Ifarning abllltlea. Hareaa elsewhere In the literature of 
vocational ^ucatlon, a plaflf made to Ini^ire that "atudent performance criteria (be) based aa 
raallatioally as posfibla on ocdupational damanda." 

The almpit pragmatlarn which pannaatea tha avowed aim's of vocational education makes It 
partlculirly ngeptlvt to^parformanca taating procedurea> Yet, since educational valuea and goals 
in a elurillatlc aoctfty dp not form thamaalvea Intoa tidy monolith, vexing problema and 
unarMWef^ quaatlori about th# cbncapt and practicea of performance teating remains Theae will 



34 



PHILOSOPHICAL 

issues 



VoostioMl Training m an AdvarsB Alt9rn9tiv9 

That colltga preparatory programs havf over tha. years enioyed favored status iri the public 
view at the expense of opcupational training has produced special problems for vocf^ttonal 
•ducation atudenta and staff alike. Often considered a dumping ground and salvage operation by 
academic purists and elitists, vocational school^ have faced a particularly arduous clMlleng^ In 
the cbnstrvatlon of undervalued human resources jand In equipping their students for entry Into 
the labor market. Given these circumstances, It Is not surprising that training as a tryout 
experience and as A form of career guidance has held a prominent place Iri the goal hierarchy of 
the vocational schooisr Thui^; the literature of vocational education frequently mentions the need 
for appropriate ev|iluation techniques tb monitor student progress and the efficacy of 
instruction.* ' .. . . 

Unlike the coriver\tional academic burrlcula, where student grades have been employed as 
general Injdlcatocs of readiness for' occupational entry or higher-levil ichooling, vocational, 
programs have k>een expected to furnish clearer and more direct evidince of task mastery by 
students. State industry-labor apprenticeship councils and other certifying bodies now specify 
rninimum standards of acceptable work-related behavior ih terms that schools pannot afford to 
ignore. Some authorities now call for a detailed serfes of tests which will provide information 
about the no'ncollegerbound studenf comparable to th'e information which the standarized 
achievement test battery furnishes about the college bound. Sidney Marland, the formel^ U.S. 
Commissioner of Education who later proposad career education, wrote: 

A culminating examination should be created with all the strength and quality and 
prestige that now characterize the College doard examinations, This examination ^ 
should include, In part, the appropriate academica of a'tlberallzing curriculum, but It 
ahould have as its principal measaga a measure of the quality of skilled performance in 
a given occupation that may be expected of the examinee.* 

Taking a cue irom the CEEB, Marland suggested that this new type of test be called the JEEP 
^ (Job Entry Examination Prograrn). 

Op^n Admlasiona and P9rforman0§ T^stihg §t Risk . . * 

* ■ " ^ . ' ' * . 

Tw^signlflcant contemporary trends in American fiducation— the open admissions policy in 
colleges and technical schools for disadvantaged and nontraditlonal applicants and the adoption 
of program-c6mpletlon certifying examinations—appear to be on a collision course. One leads.to 
a substantial increase In the proportion Qf students with marginal skills for academic survival; the 
other sets a uniform standard of acceptable learrfing and may produce an Increase -In student 
failures. Many.high schools have attempted to settle the problem of low-achieving students by 
quietly adopting a policy of automatip promotion. Criticism of this policy has been widespread 
and severe. Faced with growing percentagea of high, school graduates who enter instityjllon^ of 
higher learning (now over 50 percent), our colleges have three choices: (a) grade Inflation; (b) 
watering-dowr^ the curriculum: and (c) maijitaining past grading standards, testing standards, 
and course requirements andietting dropout and failure rates run the consequences.. There is at 
Jeast indirect evidence that the firat two alternatives are now being widely used, although it 
would be difficult to firfd those who ipprove. The .third alternative, ajthough more forthright, * 
Itgain aatlsfies no one, and, in addition. creaies Mrlous^mbarassment for the institution. 
Culturally disadvantaged students who entered tie institution with high hopes may feel 
dlalllui^ned and betrayed by false promises and expectations when they fall. Students." parents, 



/ 



BOROW 



and flovtrnlng boards may than charge thatthe tchocM does not provide useful educational . 
aarvicas qf a raasonabia quality: Moreover, the prospect of wholesale test-based failure rates in a 
period of declining enrollments Inevitably Invites the Institution's anklous attention to the 
ramadlal needs of the marginal students. Grant has stated the case: 

"From the Institutional point of'view. the major Impact of adopting a competence-based 
approach Is to shift more of an Institution's resources from the best to the average and 
below-avarage atudents. Those Invisible' Students, formerly given C's and D's for 
endurance and passed along, become highly visible In a competence-based format and 
no longer merely slip through the Institution unnoticed. The competence approach 
forces a redistribution of faculty labor to them. A higher proportion of the faculty will 
" spend more time teaching these students basic skills and helping them achieve specific 
outcornes Wan m Iradltiendt schoolr.''* . , " 

It is clear that competence-based education and performance testing, when used to certify 
student mastery of required skills and undeAtandlngs, may exacerbate certain already existing 
problems. 

P9rtorm»nc» Te9Ung and Behavioral Objactlves 

\) ^ 

One of the most compelling and attractive featMres of performance testing, when linked with 
* competency-based education. Is Its insistence on oparatlonalizing instructional goals and casting 
them in a readily observable and quantifiable form. The task of conceiving and constructing a 
performance test directs specific attention to the Issue of training objectives. What is it In 
behavioral terms. I.e., directly observable response?, that the training program is attempting to 
accomplish? Assuming that the student has acquired the tachnlquas which provWa the raf^n 
d'atn of the Instucflonal procesa. what is it In specif iablf terms that the studtnt should now ba 
able to do and to understand? While it is, <Jf course, true that the davalopmtnt of any educational 
achlavamant testa may force this kind of close look af the purposes and 0utCO>et of Instruction, 
this advantage seems especially true when the pompetence-based strata^jy Is applied to tha 
construSlon of performance tests. Beyond the question, then, of how effective a performance 
' test may be as an Instrurhent of appraisal, the complex act of plartning and constructing it has a 
potentially salutary effect on the process of Instruction itself, * 

Let us saa how tha logic of competence-baaed education underlies the deyelopmant of the 
test. Since performance tests are not isomorphic with actual job performanca but are at best 
analogues or predictors of the latter, test researcM»f» and technicians have had to grapple with 
tha question of what constitute! a workable test. They must decide what features of an 
evaluation device make It administratively feasible and. at the same time, allow it to approximate 
both tha training objective's and behavior on tha actual |pb. 

-A prior condition to ba met, however, It the specification of the loglcirt sequence 4nvo|ved ^n 
the maaeurament of the learning Itself. In brief, these steps Include (a) Identification of the units 
of bahivlor which are* central to performance on the job for which the training is expressly 
deilgnad; (b) selection of operational criteria matched to the units of job-relevant behavior 
idantlflad In the mWi^ step af this sequence; tc) ^stermlnatlon of what Is to be learned from the 
formal training axpeiianca Rself that will optlirtiza prospects for the development of tha 
aforamantioned bahivlof .uriHt;\<dKl^^ of the learning content and goals In step c In 

maaaurabla. l a., dlraotly observable, terms; (e) arrangement of the conditions of training and 
training perforrrWnce such that axtraneous varlablfs. I.e., those not pertinent to the occupation 



PHILOSOPHICAL 
ISSUES 



itself, are qontrolletl or minimized; (f) asaessment and scoring of those behaviors within the 
training setting as specified in step d; and (g) statistically relating the data derived from steps b 
and f. The final stej) tn the sequence provides both a validation index of the training crlteri9n 
and, less directly, a measure of the relevance or effectiveness of training. As previously noted, 
this ultimate operation is rarely performed because rigorously controlled follow-up occupational 
data may be difficult to obtaifi and because,^ further, the validity of the performance tests used to 
appraise training outcomejB is seldom questioned. 

^ ■ ■ \ 

. The BBhavioral Oblectlves Controversy 

Lively disagreement exists among educators concerning the merits of behavioral objectives 
in performance testing. Critics argue that behavioral objectives give clarity and sppcificity of 
educational putcomes at the sacrifice of the deeper understandings involved in learning. 
Reducing complex Instructional goals to a series of discrete; easily measured tasks or responses, 
they believe, may barter some of the more distinctive products'bf human (earning lilce creative 
thinliing and imaginatfbn for trivia. 

In truth, behavioral objectives often appear to be excessively lean and limited in scope. 
Advocates of competence-based performance^criterla counter by notlhg the^nebulous nature arid 
inaccfssibllity of global objectives. Frequently, top, ajialyzing a complex slclll into its component 
parts may afford a more effective means o^ planning the teachipg of that skiti and of measuring 
the instructionaj product. And for complex tasks requiring mastery of a krK)wn set of Identifiable 
principles and psychomotor operations, as in many jobs, casting the goair of learriing^n 
behavioral form may be quite advantageouji. Still, It must be conceded that the behavioral 
approach to identifying training objectives may give disaproportior^ately heavy attention to those 
which can be most readily transformed Into directly observable and conveniently recorded 
responses. 

A speoious criticism of behavioral objectives oacurs when the concept Js used as a synonym 
for behaviorism. The behavioral objectives approacji i^es the beh^vioristic principle of specifying 
behavior in terms of observable responses. However, n applied behavioral tectinology; 
behaviorism goes far beyond the questions of how obj/actiy^g are derived and stated. -It deals with 
the techniques for systematic 6ehavior intervention and change through appHcation of sitc^ 
principles as classical )ind operant conditioning. These include positive reinforcement aversiye 
stimulation, <:ounterconditioning (desensitization), and ev«n social r]nodeling. Since none of jthese 
techmiques is applicable in generating the behavioral objectives for a performance test, it is 
irrefevant to attacl< behavior objectives qua' behaviorism. * . - 



Unr980lv9d ISSU99 % ' , v . 

If dfie accepts the thesis ti\at competence-based education holds the promise of bringing 
curriculum design and educational experience cjoser to relevant life experience,. the potential • 
value of performance testinf^as a means of monitoring both the c|uality of the instructional 
process and certifying student attainmant of specific goals ^eems beyond serious dispute. To 
embrace this premise, however. Is to timultaneously assign increased tignificance to 
performance tests and to invite some vexing questions about the limitations of testing and 
inappropriata testing practices. Unless the urgency of-such questions Is acltnowiedged, the 
performance testing movement rpay flounder or ipse its direction and become the target of «ien 
more strident public attacks. ' ^ 



' .•' - . 29 

. 37 



A number of technical measurement problems In pertormahce testing. Including those of 
validity reliability, behavior sampling, and cutting scores, are addressed in this chapter and 
elsewhere in the volume. What remains are certain unsettled issues concerning the interpretation 
of performance test scores and the proper place of such tests In improving the quality of 
education, these concerns will be briefly noted in the form of questions. 

1 Hm much b9ahng shpuld p9rformanc9 fating hav9 on what la taught? As previously noted, a 
* weli-designed performance test will be keyed to teaching ob|ectlves and to the masteries to be 

achieved Accordingly, it should come As no surprise to find a substantial correspondence in 
competency teeting between test contents and the subject matter of instruction What may 
occur In teaching practice, however, is a subtle reversal in antecedent and consequent 
conditions by which the test becomes the curriculum and the school, unwittingly perhaps, 
begins to teach for the test. Under' such circumstances, a real danger exiats that performance 
testing may become thf basis for a new meritocracy. The beat of tests offer only a limited 
. aampling of the behaviors and competenciea which achools wish to ^ranamlt. Although the 
alma of education may be defined in terms of test content, tests are poMdentlcal with the 
corpus of education. To arrange instructional experience so that only the contents of 
perfprmance tests are taught would be jto render the educational process static and unduly 
confining. 

2 How much rallanoa ahould be placad on parformanc^ taata in making aducatlonal daclslona? 
Thia queatlon Inquires tactltly about the confidence we can juatiflably place In teats as 
indicators of students' true competence. Tests can never wholly capture the m/se en scene In 
which we wish to observe the student at work and in life. While well-designed performance 
tests may provide one of the best means of judging a student's eligibility for training or for a 
vocational certificate, they fall to reproduce the full range of conditions which come into p ay 
when a atudent is adapting to poat-schoof experiences, including employment. Hence, it wm 
generally be wise to combine test information with other relevant tOurpea of Information when 
making fudgrrt*nta about a student's competence. An example would be the training 
supervisor's systematic and standardized rating^! of a student's performance in a cooperative 
work setting. 

9 Ooea th9 ua$ vf parformanca taata tand to undL haatan occupational program cfaclalona by ' 
' atudama and narrow th^lr currloular axparlancfa? It is common td encourage high school 

• . atudehtt in a eyatem pf ^ompetence-baaed vocational education to shape their course ^ 
selectldna io the skills and underatandlngs theV must demonstrate through teatlng. Yet the 
3reir plana of many of them are atill unstable. The secondary school experience should be so 
arranged aa to 'preaent a broad spectrum of exploratory activities for students. It should 
facilitite the procea* of career development rather than close It down wHh occupational 
training which 4a Irreveraible or too reatrlctive. 

' * 

4 Wllfatrind tbwrd Incraaaad partorhanca taatfng in oompatanca-baaad aducatlon diacouraga 
' ampha$l9 on th$ Hbaral atta? When testa are uaed to ataeas a narrow band of vocatlpna 
abiiltiea. the net effect la retrogreaaive. The need to atrengthen the vocational vtppcXa of 
education to that all atudenta leave adhool with marketable skills is readily conceded. Still, as 
Wlllen points out in hia chapter on philosophical Isauas. the more specialized career goaia are 
defertalbb only when th«y are derived from and articulated within a comprehenalve system of 
generireducitloi)a( gOila. It follows, then, that occupatlonally-orlented performance testing 
ahouW t>^r a kinihlp tat^^^^^^ competer^ce which are llnk^ to the alms of broad, general > 
edubation. i 



30 



38 



PHILOSOPHICAL 
ISSUES 



Does the teaching and testing of standard operaling procedures and a fixed body of 
knowledge tend to promote rote learning and discourage creative problem solving? Note was 
taken previously of the behavioral strategy oi stating the performance outcomes of training in 
crisp, directly observable tornis. This approacfi to specifying ob)octivos offers obvious 
advantages but, at the same time, tends to load tests with fragmented, static, and 
closed-system contents. There Is need for experimentation with performance test item types 
which stress broad conceptual relationships, logical rea'Soning ability, and originality in 
problem solving. ^ ^ ■ ' 

Are individual students sometimes the victim of unfair decisions based solely on low 
performance test scores? A qualifying examination which possesses at least moderate 
predictive validity will classify students (for purposes of program admission or program 
completion) with a degree of accuracy substantially greater than chance. Furthermore, for test 
applicants as a group, the average d^crepancy between predicted and actual performance on 
the criterion measure will^ be significantly, smaller than errors resulting from guesswork or 
those resulting from traditional screening interviews and letters of recommendations. For marvy 
years it has been this empirically demonstrated ability of valid tests to outperform older 
screening methods that has justified their use in making classification decisions about 
students. However, unless a test has perfect validity, a condition which never occurs in reality, 
some students will always be misclassified by the test scores. It has been recent challenges by 
student candidates who hiave apparently been able to show that theV possessed the 
competency denied by their, low test scores'which have brought the issues of test fairness and 
competence-based education to public attention. As the chapters by Pullin and Tractenberg 
show, accountability through performance testing entails a number of thorny ethical and legal 
considerations, and the Controversy remains unresolved. But for many test desfgnefs and 
users who must deal realistically with the state-of-the-art limitations of measurernent devices, 
criticisms of cogipetency testing often appear. too severe.'" Until more accurate methods of 
certifying student performance can be developed, they ask, does it not make sense to use the 
most accurate testing procedures available, procedures which minimize classification errors? 



31 



Not«t 



'Lee J. Cronbach. Easwtl^ls of Psychology Teating. 2d ed (Now York-: Harper & Brothers. 196Q). 
pp. -103-23. 

*J Alan Thomas. Cost-Benefit Analysis and the Evaluatlon^f Educational Systems. Proceedings 
of the 1»W Invitational Conference on Testing Problems (Princeton. N J.: Educational Testmg 
Service. 1f>69). p. 90. / ' 

'W.' Richard Stephens. Soc/a/ Reform' and the Origins of ^ocBtlonal Guidance (Washington: 
National Vocational Guidance Association, 1970). 

. -wmiafrrMlcheels and Ray Karnes. Measurlna Educational Achievement (f^lew York: McQrav^-HIII 
Book Company. 1960). , * 

•Melvin L. Bartow, ed.. The Philosophy for Quality Vocational Education Programs, 4th Yearbook 
(Washington: Amerlcarv Vocational Association. 1974). ; 

•Tim WentUng and Tom Lawson. Evaluating Occupational Education and Training Pnograma 
(Boston: AHyh and Bacon, 1975). . ^ ^ 

'Richard Ericksoh ahd^^TIm WentUng, Measuring Student Growth: Techniques and Procedures^pr 
Occupaf/oniredwcaf/on (BqttontAHyn and Bacoh, 1976). ♦ 

I '^Idriey P. Marland. A Customer Counsels the Testers. Procwdlngs of the 19«8 Invitational 
'Qonferance on Testing Problems, p\1 11. 

^ •Qertid Grant et. al., On CompeUmce: A Critical Analysts of Competence-Based H9fortns in. ^ 
kj/ier £diiC«f/on (S«h Franclttoo: Josaey-^Bass, 1979). p. 11. 

'•Ralph W Tylar and Richard 1^. Wolf. Crucial Issues In Testing (Berkeley: McCutchan 
/ publlthlng Corportitlon. 1974). 



PHILOSOPHICAL 

ISSUES 



Jack C. Willere 
Georbe Peabody College 
for Teachers of Vanderbilt Univei^ity 

NasfivHte. Tennessee ^ v 

"A workman is comrtiendable, not for tlie will by wliicli lie<worfc8, but for tfie quality of liis 
performance." Witti tills perspective, the tiiirteentii-century piiilosopher tiiomas Aquinas 
illumined a basic value of western civilization.. Even if it be true that "where there is a will, there 
is a way." the fundamental and final criterion of a working person is the quality of his or her 
work performance. 

Yeflhere has. always been another perspective, not diametrically opposed to the quality of 
performance, but placing its higher hopes in pure theory, in intellectual coiitemplation as an 
intrinsic and ultimate value in and of itself. The history of education, of our civilization, and of 
our nation is-the story of the conflict of this counterperspective w4th the vjilues that give highest 
priority to the quality of performance and product. Today, the history of that conflict between 
thintiing and doing may be seen in the issues in performance testing. 

Should educators limit themselves to the basic cognitive tasks of reading, 'riting and 
'rithmetlc. so that fomnal learning in schools wlll.be restricted to the intellectual skills necessary 
for academic scholarship? Or, is the primary purpose of education to instill a sense of the 
competitiveness of social and economic realities and, accordingly, to prepare students to 
perform their best 'in the worst situation? Or, again, should the schools place a higher priority on 
recognizing the inherent worth of childhood and youth, not as periods of preparation for some 
unknown adglt future, but as time for joy and celebration, for self-expression and good feelings 
about one's eelf? • . . . 

These questions are, admittedly, phrased in ways that educational theorists would never 
propose for their purposes and programs. This outlandish manner, however, is not tp disparage 
the serious enterprise of thinking cxiticallysabout schools and teachers and learners. Instead of 
belittling the difficult but necessary task of asking hard questions about educatlon and human 
development, we must at times ask thepi in taunting, jeering terms to reveal theii^ underlying 
narrow assumptions and myopic prescriptions. 

No single educational philosophy or program will .rneet eveWmost of the needs among 
individuals in a rnuUlculturaK, pluralistic society characterized b)\com.peting interests and 
conflicting values. Still, educators easily become infatuated with;fads, infuriated by failures: 
inflated with easy, fleeting successes; and more often than not, infected by the infallibility of our 
own purposes, perspectives and progr'ams. 

> • ■■ ■ " ' 

' '• ^ 



WILLERS 



All educational reforms, therefore, such as performance testing of coi..p«i«noy-ba^«vi 
instruction, deserve the critical review, as well as the experimental testing, ^ha^gives them the 
opportunity to prove and to improve their own performance. Whatever the Philosophic bases or 
performance testing, it must at least open itself to the tests of porformanco and sub oct Itso f to 
critical evaluations nccordino to fundnmnntnl, ofton contlirtino and cortalnlv compotlng. values 

Evaluation by performance of psychomotor, job-related skills will certainly not receive the 
wholehearted support of classical educators. Nor fs performance testing enjoying the firm 
support of humanistic educators who emphasize the values of play and leisure and inner 
self-direction Especially critical of performantle nesting today are those educators who are 
sensitive to the apparently destructive forces in modern technology that deplete our natural 
resources, pollute our environment, disturb delicate ecological balances, and exploit and . 
dehumanize skilled working people for profit and power. 

Arguments for fairness In testing. notwithstanding, these crWIcs and skeptics have legitimate 
messages of caution. In general, these Issues speak to the limitations, narrowness and 
inadequacies of performance testing when overstressed or used to the exclusion qf other claims 
and interests. Though threatening to narrow self-interests, such critical messages of caution can 
provide clarity and breadth of purpose together with insights into other worthy means of judging 

human development and achievement. 

., •■ ' 

i 

From AnalytiCBi DBtlnitlon to Critical Judgment 

A performance test is presumed to be admeasure of occupational competency or the ability 
to perform a job-celated skill. This presumption, in turn, is based on the assumption that job 
skills and even overall occupational functions, can be reduced by analysis to meaningful, 
manageable and measurable sequential laments. The competent performance of these work 
^eeofftents may then be examined and evlbated. The purpose of performance testing, 
accordingly. Is to discern the quality of a particular individual's competency to perform a - 
particular job-related skill or tb qualify for a particular occupation. 

Human beings, accordingly, are selected for additional training, jobs. careers and e^^^^ 
certified for various occupatlofts-in other words, granted the rewards for ^^^"^ 
social usefulneaa-on the basis of others' critical evaluation of their competence's Indicated by 
the met*urement criteria of performance tests- This analysis •"Jl"»ly dl^-ren* ^^Jr IT 
straightforward proposition that people are selected and rewarded on the basis of their own 
ictuJl Performance on the job oMn the occupation. Test designs, test criteria, job d-crlptlons 
and occupational analyses, test constructors and their judgments on what to measure and ho w 
to measure lt..teet administrators and test evaluators all stand between the individual performer 
and the rewards dispersed. 

I ' "* 

Furthermore, a performance test, providing an adequate basis on which to judge^e degree 
* of proficiency with which an occupational competency Is performed, must also prcw* 
quantitative measdrements by which the more competent craftsman may be distinguished from 
• the lets competent. Thus, not only does the analytic reduction of work sequences underlie 
performance teaWng, but also the measurement apd evaluation of job-related skill competency 
require the jquantmcatlon of qualltlea of both performance and product. 

A well-defined objective is essential to performance testing. That objective may be the ability 
to perform a rf^anlpulatlve skill to a certain qualitative degree, or to produce a final v^ork product 



PHILOSOPHICAL 

ISSUES 

r 



* 

that me^ts certain standards of quality control Ih either instance, it is not merely tt^ 
performance procesfe or the product of the work-sample that is evaluated, but rathOT<l|t)^ 
competency of the work which is critically assessed. 

The purpose of analyzing and characterizing performance testing in the abovo mannor is not 
td establish either a working or final definition of performance testing on which all may agree. 
The above analysis does make clear, hovirever, that regardless of the technical definition* or the 
characterization of performance testing used, performance testing, like all other forms of 
evaluation, inevitably must make assumptions about reality and human experience. It must claim 
some value criteria for the discernment of quality and the Judgment of degrees of quality And. 
performance testing must reflect some beliefs about how we learn, about how we demonstrate 
and apply knowledge, and about the values assigned to that knowledge. 

This perspective places performance testing squarely in the philosophical domain of critical 
interpretation of beliefs about reality, values, and knowlectge. Presuppositions and fundamental ^ 
value beliefs require identification, clarification and criticlsnrr Basic concepts of reality, ^ 
intelligence, and social utility must be questioned, or at least held critically, and applied 
cautiously in diagnosis, evaluation and justification. The assessment of performance competen- 
^cies from this perspective is a human affair, not mechanical, not prescribed or determinect, but 
subject to the whim firSd prejudice or capriclousnesa, as well as to the reasonable disagreement$ 
of rational p^ple. 

The argument >hay appear strained and unnecessary to those who already acknowledge the 
human elements and the concomitant possibilities of error in performance testing. On what 
logical or utilitarian grounds is an individual justifiably subjected to performanace testing? Are 
there other. Ij^etter reasons fo^ not testing performance? Js such testing a subjection to external, 
impersonal norms that are less valuable or substantiable than others? Or is performance testing, 
rather, an individual opportunity to express uhique human dignity, to exqel, to learn about and 
. respect one^ own stolf ? 

But these and many other philosophical inquiries do not suggest themselves to those who, 
with a deterministic or mechanical- perspective, view evaluation in general and performance 
testing In particular as the automatic process of perceiving degrees of quantitative variance or 
correspondence between two sets of clearly obeervtble data— the external test standard^ and the 
tMhavioral performance. From this perspective, no values, interpretations; judgments or 
responsible assumptions are expressed In constructing performance tests, evaluating their 
outcomes, or even in the decision to administer them. To the contrary, performance testing is a 
value-free maneuver, a technical operation freeing both the tester and the performer not only 
from capricious judgments of the quality of competency but, more significantly, also from all 
questions of fairness and justice in allocating economic rewards and social recognition on the 
basis of performance. The only problems or issues related to performance testing from this latter 
perspective are the technical questions Qf test validity and reliability. 

However, even the troublesonrie. tentative question of whether performance testing is, on the 
one hand, a human interaction, cbnilsting of purposes, intentions, social goals, culturally defined 
criteria, and theoretical assumption^ or, on the other hand, a value-free mechanical operation, is 
itself a question which justifies, even rsquires discussion regarding the philosophical issues. 



35 



WILLERS 




EducptionMl Goals and PtrformancB TeatInQ 

The alms and goals of aducatlon provide a perennial pursuit for philosophic perspective The 
value of life and the values worth seeking and living for are constant questions poiplexing tho 
CMtlcul mind. Tills tms ulways boon truo hot npponr", ovon moro 10 m nn orn romrnittorl to 
science and technology, neither of which in itself purports to dofino our values or to solve our 
value confllcts .lndeed, from one debatable view, sclwice and technology, while claiming to be 
value-free, have caO'ed Into question our more stabilizing, traditional values, thereby creating 
rriany of our value conflicts and dislocating core values necessary for social cohesion and 
continuity. ' • 

In an age In which science and technology of overpowering dimensions dominate tho 
curriculum, what Is education for? What are the valu«&^and goals sought through the myriad 
forms of Instruction, training, programming, conditioning, teaching aqd testing? If there is some 
answer to this question, it would necessarily be complex, but even then we would have only a 
description of the various social and personal goals people strive to achieve through leacnlng. 
More crucial is the normative question; What ought to be the alms of education? From differing 
responses to this primary question follow the practical matters of designing curricula, applying 
instructional methodologies, organizing and administering learning situations, and evaluating the 
results. * ' 

The question of alms. \\kB all normative questions, cannot be answered in any final sense, 
•only »n terms of philosophic perspective to which there would be equally appealing or more or 
less defensible counterperspectlves. This Is not the place to argue for this or some other goal of 
education. Btit to place the question in terms relative to performance testing, let us at least 
propose a theoretical framework from which to work. This approach Is attributed to Thomas F. 
Green *nd can be pursued In greater detail and^accuracy In his ' Minimal Educational Standards: 
A Systematic Perspective."' 

Thft alms of education may.be either general or sptcific. Specific educational objectives 
Indicate that which constitutes their own achievement and also designate the time when the 
goals are to be achieved. In this respect, performance-based training always aims at specific 
goals In the form of behavioral objectives, and It -is the function of performance testing to 
Ihdicate ^hen and to what extent thesJj specific goals are attained. ^ 

Qentral educational goals, on the other hand, are vaguely expressed so that It Is never 
potelble to discern When they have been attained or the ext«frt.x?f their attainment. Accordingly, 
no form of educational measurement, perhaps least of all performlwioe testing, can measure the 
achievement of general educational goals. For this reason, and baCause our culture places such 
great emphasis on measuring and counting for the purpose of efficiency and economy. It Is 
•uggttted by the accountability movement, eompetency-based instruction, and the efforts to 
manage education by objectives that all seemingly useless general goals be replaced by specific 
objectives, the attainment of which can be measured, monitored and managed. 

But the argument to eliminate general alms, in favor of th^ specific, rests on a 
misunderstanding of the function of general goals, which is to designate, not what IS the good or 
the best, but rather what Is unacceptable. As such, the general goals of education prvvide the ' 
grounds for defining specific educational objectives. - * 

In short,^neral goals operate effectively In the establishment of specific targets provided 
we recognize that their function is to provide criteria for determining what kinds of arguments 



36 



^ PHILOSOPHICAL 
^ ISSUES 

will conttltuft ••riout chargM of failure. Specific educitlonal goals are derived from general 
educational goala through a social process In which there Is produced a definition of what 
constitutes not the beet, but the worst that Is acceptable. To suppose that specific goats for the 
system can or must be generated independently of general goals is to succumb to a most ' 
(undaijiental intsundufstanduig of thu naturu of uUucattonal goals ' - 

The achievement of specific alms Is the business of com pete ncy-biised Instruction^ and the 
measurement of that achievement bf specific alms Is the bus1ness4)f performance testing. Now. 
the problem remains as to whether these specific alms depend upon the more gweral alms of ^ 
education. Or have competency^based instruction and performance testing replaced general f' 
aims with specific performance targets arising from the art of the possible in instruction* and the^ 
prescription of t>ehaviorai objectives? Unless specific goals depend on the general aims of 
educattpn, performance testing, and its array of associated educatiofiat movements, wilt drive us 
further Into an educational malaise of confusion and loat confidence. 

N If spedific goals are not related to mcip general goals that expreaa broad social values and 
shared ideals, narrow interests will contmue to compete ruthlessly, Unsuspecting learners, 
striving to improve their own performance, will be caught up in the competition to exploit their 
improved competency, ^nd schools and educational systems will continue to't>e condemned for 
lack of efficiency or productivity or almost any other failure, presumed or real, on grounds which 
are irrelevant because they do not reflect general goals of the society or the system. 

The discussion seems to have generated another dilemma for performance testing:^ 
Competency In performance cannot be tested fairly upless it is an established and measurable 
objective of Instruotlbn. Such an objective is njscessarlly specific, designating the specific criteria 
for the evaluation of its own attainment. The behavioral objectives of competency, furthermore, 
emerge directly from particular job-related skills, not from broad cultural aims and values or from 
general aocial ideals and goals. AtxUfpl as It has been argued, it is exactly these kinds of ^ 
narrow, sfyeclalized performance jjoals that endanger the society's seryie of commonalities and; 
consequently, the indivlduars relationship to that fragmented society. 

Studies rc|0arding Individual alienation and dehumanization need hot be recounted to 
strengthen the argument against specific instructional goals sought in isolation from' broad social 
Ideals and general educatlonaralms. But the tragic picture does flash across the screen: a highly 
•proficient pefson, competent In a variety of economically useful akilla. who possesses little or no 
sense of Individual or aocial identity. self-wi)rth, or meaningful direction.for life. Such a person 
skillfully fells the trees without ever ayseing or appreciating the beauty of the forest. The 
concomitant destruction of our physical environment and the senseless waste of our natural 
resources, almost matches the loss of human resources. 

. With respect to the dilemma, some uneasy compromise between the demands of the ^ 
technical and the necessities of theuhuman may provide some small consolation. The 
compronriise will not satisfy those whQ give highest priority to the inner dignity of the person 
rather than to creature comforts and increases in the gross national product. But. given the 
present powAr of the contlnulng*persistence for consumption over creativity, something by way 
of compromiser may be better than nothing at all. 

Thi8 poMibllity of compromise lies in the hands of vocjBtlonal and technical trainers who 
conatruot behavioral objectives and utilize pe^ormance tests. To'these educators and evaluators 
fall the opportunity at least to refer the specific goals of training to broader social and 
educational alms. 



37 



4tz 



In what respectt. it might be asked, does the attainment of proficiency In some per ormance 
respond to the broader, generally accepted aoals of our culture to encourage Individual 
creativity to foster conaervatlon of our natufal environment, to develop critical yet cooperative 

citizens and to stimulate a sense of self-Identity as v^ell as a sense of belonging? How might the 
assumed opposition ol habituated skills u.id c.oulivu imagination, of routino work nnrt T>ypros.lv». 
ieiaure be reconciled Into mutually complomonta.ry counterparts? 

To aak iuch queatloni and to begin ta answer them and to apply partial answers In actual 
practice, requires that the vocational education evaluator become an educational sociologist, 
hiitorlan. and philosopher, able to recognize and critically evaluate the general alms of education 
in order to give meanlngfulness to specific objectives. Above all else, for social concord and 
Individual human development, it Is necessary that the performance instructor and evaluator 

Judge far more than the skilled performance, and that the learner learn far m^re than 

* performance skills. 



Performing Slaves— The Perennial Fear 

Philosophical Issues converge on performance testing from across the spectrum of 
educational thought, even from opposite directions. From the radical end of the continuum, 
neo-humanlstic educators, third-force psychologlsts..and existential philosophers rail against 
Imposing external standards on unique Individuals who are free to choos* their own values and 
deatlniM On the other hand, educational fundamentalists, the perenniallsta. would return our 
modernized, mass, corporate culture back from the vocational training of slaves to enduring 
universal truths and values which serve as absolute criteria for human behavior, action and 
performance. For these educators, the aim of schooling Is "manhood, not manpower. 

Fromjhls latter perspective, human performance Is not to be measured In terms of Individual, 
Interests or n—6%, for all people possess a conamon natural power for r«t»onallty. 
Education must accordingly rely on the universal and the permanent, not the particular and the 
transitory. Nor Is human performance to be measured In terms of particular marketable 
vocational, technical and professional skills which, apart from the power of rational judgment 
mark our society's performing alavea. Corporjite Industrialization, technological advancet. and 
the obtervatlon of changing facts, all served by performance training and testing, readily enslave 
the skilled In whom the potentiality for rational self-dlrectlon remains unrealllied. 

Thus Robert Hutchlns. in advocating perennialism in education, rejected outright most of 
the comrrlonplace objectives of American' schooling today, and especially training In vocational 
competencies. Since a system of education will Invariably reflect major cultural forces, he 
argued it would be naive to think that the schools could develop Intelligent humans when all 
social pr»aauret are applied to the development of uncritical, unthinking consumers and 
producers Our cultural mlation must, therefore, be redirected, away from national power and 
accelerated technological changes that take no thought of rational humAn progress or social 
consequences, toward wisdom, understanding. Intelllgehce. and rational thought and judgment. 
To realiie our rationality and thereby reach our full human potential, a liberalizing, freeing 
education r^ust be provided to all. "not to make practitioners but to help In the development of 
intelligenf men and women."* 

One could probably argue well that there is nothing In performance training ahd testing that 
It Inherently contrary to intellectual development itself. Ahd tests can and have been Intelligently 
developed that do measure performance abilities and competencies. But. then, thofe who make . 
such successful arguments, and those who construct sugh reliable and valid performance tests. 



38 

4 



PHILOSOPHICAL 
ISSUES 



mutt t>« utiMxing • dtgrvv ot rationality that tht Bklllad p«rfo»-m«r nn«y not have had opportunity 
to dav«lop. And thia poaalblllty la tha crux of tha laaua. NVhIla It la not a matter of either rational 
judgmtnt or akillad parformanoa, It It a quaatlon of prlorltjaa. Skilled performance without 
critical, rational intelligence becomes, in a world of rapid technological change and built-in 
obBoleacence, a prelude to partlcipatinq In one'fi own viotlmiyfltlon J\tn\ ertiirntlon rpflortn thf> 
dominant forces of the culture, so we teach toward the tests And pertormanco testing presents a 
dangar of luring job-akllla with Inttant yat transitory reward. 

Still, It may ba arguad, It la battar to poaaaaa any marketable performance akill than none at 
all or. what may be woraa, an Impractical, purely contemplative Intellect (If thera Is such^a thing). 
But this argument forces us back \n\o an elther-or dichotomy by denying all other alternatives to 
the e;<trema8— either the performing slave or the intellectual who would be free if he/she knew 
I how to dd anything at all otfier than think his/her own thoughts. But these two alternatives ara 
yar from axhauating our human poaalblltlas, and, baaldaa, parfomrtanca of^an dapanda on creative 
or critical Judgment and cognitive knowledge that no strict parformanoa teat alone can (neaaure. 

The Intelligent, creative and critical worker Is, therefore, no threat to vocational education or 
performance testing. Rather she or he Is the challenge. 

» 

S9lt an<itSocl§ty~th9 Continuing Split 

Partorm«nce of skills and evaluation of parformanoa may be viewed from the perspectives of 
three domains commonly used today to classify educational objectives— the cognitive, the 
psychomotor, and tha affective. P^formance testing Is primarily, though not exclusively, 
concerned with the measurement of the achievement qf psychomotor objectrvas and 
competanclea. As we have seen, educational fundamentallsts^re concerned with the cognitive 
actualization of rational potentiality. From the third domain, the affective, philosophical IssUea ' 
converge upon performance tea(tlng. 

' These laauaa, raised by humanlatic and existential perspectives, center on the conflict 
between external controls or stimuli, pressed upon Irfftrners from without to modify behavior and 
habituate performance, and the free Inner choices of autonomous Individuals. Furthermore, these 
Issuas focus on the legitimacy of criteria fpr the evaluation of lear^jng and performance. For the 
aake of economy, efficiency, and social expectations, can standardized, uniform criteria be 
applied equitably through performance tests to evaluate unique Individuals and the worth of their 
novel abllltiea. achievements, contributions, and potentialities? 

In an even deeper aense, the Issues emerging from concerns with the affective domain for 
tha unique worth of human Individuality raise fundamental philosophical questions regarding the 
nature of reality and the sources of truth and goodness. Are human beings essentially, naturally 
social bainga whoae originality and unlquaneaa emerge through varieties of social experience? If 
80, we may legitimize some social expaqtatlona and culturjsl norma as criteria for individual 
davalopment. But the primacy of Individual subjectivity over social expectations and external • 
standards continues to be phllpaophlcally affirmed. And, to the extent that such philosophical 
arguments possess some admlsaibllity, standardized performance testing will be questioned, and 
the objective criteria for evaluating performance will be challenged. 

_ * 

John Daway advocated Itarning through Individual participation in group social problem- 
solving activities using scientific inquiry and experimentation, This pragmatic approach is based, 
theoratlcally, on the Interaction between the Individual and the soclophyslcal environment. Thus, 



for D«w#y and tha praamatltts there Is no ultimate separation of reality into the *"»l«^tiv« ""^^ 
the objective, and therefore, presumably, no Inevitable contrary claims of Individual subjectivity 
and external social axpectatlons. But traditional patterns of thought still hold stronger, sway over 

most contemporary education, and the bifurcation of reality into two competing tealms cont.nuos 
to doiniimttt appioachoa to loatiny amJ curiiculum dor.ign 

For one thing, science In the twentieth century has not been utilized as the Intellectual and 
democratic means to achieve the human community envisioned by liberal pragmatists. Instead, 
contemporary experimental science has become the handmaiden of technology, In such 
master-slave relationships among disciplines and cultural forces, free Inquiry. Intellectual 
development and social reform usually suffer the consequences of unchecked self-serving 
Interests. Thus, today science is put to the services of many technological projects whose likely 
consequences may be detrimental to long-range human interests 

Furthermore, the scientific community has not opened to the masses of contemporary 
society Even If we do benefit economically or militarily from the technologlchi applications of 
scientific advances, on the whole, we are generally excluded from t^e Inquiry and 
experimentation and have little say In the social uses to which sclentlfk: f ^J'' 
Therefore scientific Inquiry, as advocated by those who reject the dichotomies of the subjective 
and the objective, of the Individual and the social, has not yet emerged as the means ot 
plirtlclpating In and contributing to. the direction of human affairs. 

The broad cultural consequence for education and evaluation Is that we llmin • , 
technologlzed. Industrialized worid with loyalties, beliefs, and values characterlst c of^premodern 
modes of thought. We live dally amid the external securities and conveniences of creature . 
comforts produced and serviced, sometimes efficiently ^^'^^^^^T^^l^^'^VtnT^ 
bureaucra^es that fragment..dehumanlze. and alienate. Yet we also still fee some worth for 
ourselves and for our humanity, despite our strong dependencies on institutions, systems, and 
gadgets that we rnay know how to manage but doubt we can control. 

The performance testing movement, also, will stmggle with these 
humans be treated and tested merely as reactive objects ^^^^f « P^^^^"^^^*' ?e^^^^^^ un^ue 
evaluated from without? Contrariwise, hoyy can performance testing serve the "/e^*'*' °J ""'^^ 
purposive laarnert who creatively choose their own competencies «nd the quahtles and so^^^ 
uses of those compptencles? Has performance testing already succumbed to he prescriptions 
- and reductlonlsm of narrow sclentlsm that seeks only to condition and contro the 
predetermlnants of performance? Or rather can performance testing complemented by 
Introspective self-analysis and self-evaluation of Individual Intentions, plans, volition and 
purpow? Will t^^^^^^^^ directed toward performance testing facilitate the Indlyldua Imagination 
and creativity necessary to construct novel uciderstandlng and appreciation of quality 
ZioXnZ in other words, v.111 the perfor*t(ance i)e taught and tested In »uch ways that It w I 
serve the needs and Interests of the learner, or must the learner serve the Inflexible demands of 



the teats? 



' Ultimately, these humanHitlc concern* challenge the functions and uses of performance 
testing to recognize that those skilled performances are not just economically rewarding and 
efficient. The performances most worth performing also serve the psychological renewal and 
self^ctuallzation of the Individual. 



40 

/to 



\ 



PHILOSOPHICAL 
ISSUES 



Th9 S9crinc9 of Reality 

The problem of dlstlVioulahing reality from the mere appearance of reality is as old as 
philosophic inquiry itself. Some philosophers have argued that what merely appears and is 
flootlngly porcoivod is only tornporury ufid tlius unroul. not tx) lo by conlusod with tho ondutmy 
reality of the underlying form Other philosophers have defined the very essence of reality in 
terms of what is pexcelved. while still others conclude that what is, what exists, cannot be known 
at all as it ia. in and of itself, but rather only as an object of knowledge complying with the 
categories of human understanding. As such, the question of the nature of reality may raise little 
interest except among philosophers who value disinterested inquiry into esoteric and irresolvable 
problems. 

And yet the problems of realUy and its theoretical distinction from appearance constantly 
show up Irr practical, everyday situatlohs. especially in regard to public policy issues in 
education and evaluation. Performance testing is no exceptipn. 

The evaluation of performance is a costly and time-consuming enterprise^ Thus, it becomes 
a practical matter to attempt to reproduce the reality of a job situation through laboratory ^ 
simulation. 

. r ^ 

"While most developers of performance tests strive to retain nn element of reality by 
creating worit samples or simulators, there are times when reality must be sacrificed in 
the interest 6f efficiency or in the interest of measuring certain mental processes that 
cannot be measured conveniently in any other way . . . They are quick and easy to use. 
they do represent Important elements of the troubleshooting task, and they can be 
used in locations where the real equipment cannot. They suffer from their representing 
only part of the total real environment."" 

One might add that simulators also suffer from the uncertainty of hovy well, or to what extent 
those parts of the real environment are actually represented in the simulated environment. 

And no matter how "realistic" simulation appears in performance testing, the performer 
being tested may have the notion that, except in terms of the evaluation results, the simulation 
itself "really doesn't count." Efforts to research this problem empirically of^xperimentally face 
the difficulty of gathering data and controlling variables of appearance or perception rather than 
of reality and actuality. Thus, one could never know whether or to what extent the notion of 
unreality In simulation contributes to or distracts from quality performance. In either case, 
nevertheless, the reliability of performance tests relying on simulation suffers some unknown 
degree of distortion due to the "sacrifice of reality." If the performance within a simulated 
environment does not matter entirely In reality, the performer may be either less cautious or 
more relaxed, resulting in either better or worse performance. 

This, -of course. Is certainly no devastating argument against simulation and simulators. No 
one would want to fly In an airplane whose pilot had been licensed only on the basis of 
pencll-and-paper tests that examine knowlec^ge about technical data, fibr would any of us want 
to be operated on by A surgeon who had never before used a scalpel. Still, the inevitable 
divergenciea from reality in performance testing should serve as warnings of limitations and 
reservations. Just as the experimental scientist recognizes that data only approach, never 
achieve, accuracy, and that the findings are merely probable, tentative, and relative, so also 
evaluations resulting from the more or less accurate (or inaccurate) measurements of ^ 
performance In simulated reality cannot be absolutely conclusive, and should not be acted upon 
or applied as such. Consequently, assessments should be made through a variety of performance 



41 • 

. - V 



id 



WILLERS 



tests other than those using simulation, and through means ot evaiuaiiort othei ihan 
tests Again, the argument strengthens the contention that the measurement of manlpula 
skills alone, to the neglect of Intellectual and human relations skills, jeopardizes the entir 
process of evaluation. 



Broader Horizons 



The philosophical Issues of reality In performance testing expand Into ironic complexity. The 
evaluation criteria In performance testing tak? the form of behavioral objectives derived from the 
process of analyzing actual on tho-job skills. One is successful in performance tests to the extent 
that competencies in job-related skills can be demonstrated, that is. to the extent that the 
behavioral objectives of vocational or technical training have been achieved. The tramee is held 
accountable In lerms of these behavioral objectives. If the extent of demonstrated skill 
proficiency Is adequate to some agreed-upon standard, then the trainee Is licensed, awarded a 
credential or awarded a certificate or diploma, and hired or promoted and otherwise rewarded for 
levels of proficiency achieved. 

It Is a well-known, but slightly understood, fact that frpm analysis to job-related skills, to the 
definition of behavioral objectives, to the design of competency-based curricula, to the testing of 
performance and. finally, to accountability or certification, this training/evaluation scheme 
locates Its fundamental theoretical roots In behaviorism. For behaylorlsts. all behavior Is reactive, 
a response to stimulation from the environment. And all learning Is a conditioned response to 
external stimuli. Reality consists of external contingencies and observable behavioral responses 
to them. Therefore, behavior. Including competent performance, argue the behavlorlsts. can be 
conditioned, controlled, and predicted by managing the environmental stlmulK 

It Is not the purpose here to provide a definitive critique of the behavlorartheory of learning 
or behavioral technology. It-Is sufficient to emphasize the behavlorlsts* reliance on a concept of 
reality as external and objective. Independent of Inner mental states and subjective psychic 
processes that qannot be observed or measured. 

The Ironic point Is that those educational endeavors reliant upon behavioral theory and 
technology, such as management and accountability by behavioral objectives. Including 
performance testing; cannot afford to surrender the reality from which stimulation, control, and 
the criteria for evaluation all arise. More specifically, the behavioral techniques utilized In training- 
and testing for competency cannot have It both ways. They cannot exclude from reality, or at 
least serious consideration, all that cannot be observed and measured, and at the same time for 
the sake of convenience, efficiency and economy, sacrifice even In part the external reality that Is 
all that remains. 

The argument is not that behaviorism and performance testing are wrong In the sense that 
the theory does not wbrk In practice. Each of us, as a matter of common sense. Is only too well 
aware that our behavior Is automatically reactive to external stimuli, and that learned behavior 
can be uncritically responsive to social conditioning and external reward. We are even gratified 
that this level of learning through operant conditioning is possible. There is no time for 
•speculative or critical thought when it is past tirne to slam on the brakes. 

Life would be wholly unmanageable if we did not perform most routine and repetitive tasks 
automatically, without forethought and reflection. Otherwise, we would have to learn and relearn 
trivia constantly. Survival would then be impossible: or even If it were possible, we would have no 



42 



50 



PHILOSOPHJCAL 
ISSUES 



tim« to reflect upon the reasons, purpollf, goals, values, and meanings of surviving in the first 
place. If tMhavlors coujd not conditioned by responses to external realities, many 
handicapped and retarded persons could n6t perform the many tasks of living that most of us 
take for granted every moment. 

Therefore, the argumerit is not that we carinot. or even that we should not. train and loarn 
and meaaure behaviors or performances in accordance with the scieYice and technology of 
behavorism. Rather the point is that the theory underiying the concepts and practices leading up 
to. and following from, performance testing is inadequate to the degree that It must trade off part 
of the authentic external reality fitting the Job scene for another that, by comparison, only 
simulates or approximates the appearance of the original. 

The answer to this theoretical. If not ethical, dilemma Is. of course, not to give up 
competency-based instruction and performance testing. To do so would render our society and 
economy totally unmanageable. Instead of giving up the behavior-oriented aspects of 
competency training and testing, these could be opened up tp yet broader aspects and methods 
of human development, education and assessment not covered by behavioral technology. 

For example, humanistic and existential concepts Qf human nature and behavior, involving 
free choice, self-direction, and self-evaluation, niight be brought to fore. In performance testing, 
at least, this broader approach requires that the performer be in control in the sense that he has 
made a delilMrate and critically intelligent choice to be evaluated on the basis of a clear 
comprehension of the teaks to be performed and the criteria to be met. Performance would be 
viewed and valued as that of a human being with feelings, aspirations, and worth not wholly 
circumscribed by that perforrri^irtbe. In addition to behavioral competencies, human relation and 
affective skills would be encouraged and rewarded along vyith critical, reflective intelligence and 
aesthetic appreciation. The individual skilled worker then is not easily exploited by mass 
corporate systems, and human life takes oameanings that extend beyond technical proficiencies 
and occupationaf settings. 

. In these broader terms not limited to the independent realities of external stimuli, but 
including a sense of individual self-worth and pride in proficiency, performers are not subject to 
impositions that they themselves cannot evaluate, control, and redirect. Their own reality is not 
reduced to a series of automatic reactiona to impersonal conditions and relationships. 
Performance becomes a way of expressing, realizing, and becoming one's own truer chosen 
self— not a demonstration of one's ability to meet the expectations, achieve the requirements, or 
acquire the rewards of others. 



It may be that thQse who strongly advocate performance testing, and especially those who do so 
uncritically, do so because they discern the performance of the person in the same sense as the 
performance of a machine designed to operate in some specific fashion. Certainly such a propensity 
to equate various meaninga of ^'performance" could be understood, if not predicted, especially among 
vocational and technical educators and occupational evaluators who work with machines, teach 
individuals to use machines, and teat individuals' operations of machines. 

If one would not have tuch ej^pectationi or make such predictions of artists, it is not 
because the artists are better than the vocational ists. Indeed, thejtwo may be one. But, each 
•pproaohea performance with a different mentality and a different set of presumptions. Workers 



Performing Individuals and Individual 




ormance 



43 



WILLERS . ~ 

use their tools and machines to produce a product or to provide a service; artists use iim.i 
Instruments or mediums to express and create feelings, to Interpret and convey meanings and 
Intensions, to provide pleasure and to enjoy the performance. 

In th« sNOi\d of work, il is a ^mali but signltlcnnt st(^p from mnrhine 1o mflrhlnisf. to view the 
performance of the machinist as an extension of the function 'of the machinojq this sense the 
machinist merely completes the otherwise Incomplete machine. Thus, the performance of the 
machinist vi^ould be seen as being of the same class as the performance of the machine. 

It Is this mechanical sense of "performance" that underlies performance or competency- 
based education. But the performance of a teacher or a worker of any person-^ Is not the same 
as the performance dt a machine unless one makes no conceptual distinction between persons 
and machines. Then, and only then, could their respective performances be considered Identical. 

The performance of a machine must accord with the design of Its own production. The sense 
of an Individual's performance "applies to any action of a person who has parts he makes answer 
to\the parts of the work performed, and connects in ways that correspond to relations of the 
parts of the work."^ Furthermore, the performance of a person differs from the performance of a 
machine in that the former depends on the intention of the performer to engage In It. Since this 
distinction between the performance of a human and that of a machine depends on a theory of 
human nhture as Intentional. It somewhat begs the question and Is certainly In no sense 
conclusive. Nevertheless. It Is Just enough to warn against equating mechanlca performance -with 
human performance and thereby applying the same criteria to the evaluation of each. 

If work performance cannot be taken for granted as mechanical action, that Is. as uncritical 
apDllcatlon of rules or habits, then at least the theoretical foundations of performanoe testing are 
thin and scarce. The performances of machines are not valued Intrinsically In and of and for 
themaelves Mechanical performances are rather valued for their convenient and efficient 
Instrumental functions. Their values lie In their Instrumental uses for our oj'" ^""J*" P^^P^^^^ ■ 
What Is valuable In human performances does not entirely, at least, deperid on this Instrumerital 
relationship to our own human Interest, or rather cannot do so without rejecting the Inherent 
worth of the Individual. It does little good to argue for the Inherent worth and dignity of the 
Individual performing and the Instrumental value of Individual performance, mmeasuratue 
Iniustice and suffering are historically rationalized by separating the person from the 
Derformance. ^granting intrinsic worth to the Individual and mere instrumental worth to the 
Mrson'i -irwibhanlcal" performance. Human performance Interprets and expresses, some would 
araue not oi5y the work patterns and products, but more importantly the meanings, purposes 
and Iritentlofis of the person who. contrary to popular contemporary »>«^«v'o^«;/??^"°'fjy' . , 
cannot, or at toast should not. be reduced to a repetoire of measurable, controllable, predictable 

behaviors. 

/ 

RBMIonshlps of Parts and Wholes 

one clear, but problematic, assumption underlying performance testing is that the practice 
Of an occupation Is the sum of the tasks into which that occupation has been analyzed and 
further that competency In the vocation can be achieved by learning separately to perform the 
individual Jiasks. r^jardless of their number or nature. Within this assumption, the performance 
task that Is teeted is to the vocatlbn aa a part is to its whole. 

Now the ielationahip among P«rt8. and In turn their relationships to their whole, may appear 
at fkst glance to be simple and straightforward. In some cases, such as with the legs of a chair 



44 



PHILOSOPHICAL 

issues 



\ 
\ 



\ 



and the chair itself, the relationships may be comparatively uncomplicated, though a designer or 
manufacturer of chairs may argue otherwise. The relationships among human behaviors, and 
especially behaviors relating several or many humans within a system such as a school or job or 
entire vocation, can bocomo cornplox and complicatnd boyond point of nioro doscription. 

Extending beyond the empirical description, philosophic inquiry has ever been intrigued and 
challenged by the question of complicated relationships of parts to wholes. In logic, it is 
fallacious to argue that the equalities of the prtrts also characterize the whole, or conversely, that 
the nature of the whole characterizes each individual part. In experience, this may or may not be 
the case but. if so. never by logical necessity Of course, ptiilosophy is notorious for its ^ 
conflicting perspectives, so it'cornes as no surprise that some philosophic theories prize unity 
among parts and wrthiri wholes, whrle other pluralrstrc notions perceive rncongruitres, if not 
conflicts, among at least some relationships. Monistic perspectives of unified reality value order, 
continuity, regularity, and lawfulness among human behaviors and social relationships. Others 
argue for at least the possibility, if not the desirability, of the diverse, the spontaneous, the 
innovative, the creative, and the unpredictable. 

There is no reason to Assume that those engaged in performance-based instruction and 
performance testing intend deliberately to enter this metaphysical squabble. On the contrary, 
vocational educators use these training and testing techniques for quite pragmatic reasons that 
go far beyond or never approaching the desire to argue, even discuss, a metaphysical notion 
regarding the relationship of parts to wholes, or aitocial theory advocating the inevitability or 
desirability of regularity and structure over spontaneity and innovation, or vice versa. 

Nevertheless, it must be acknowledged that performance testing and the educational 
. movements on which it depends and with which it Is associated are themselves inexorably 

related to political and educational policies that at least represent, if they do not promote, 
\ controversial social values, conflicting educational philosophies, and competing lifestyle^. 



The performance tester cannot but endorse, or at least sanction, those social perspectives 
and values inherent within the view that parts relate, or ought tq^ relate, in a unified manner to the 
wholes to which they rightfully belong: that is, that task performances go to make up the job, or 
that vocations are the sum of their respective individual tasks. Thus, regularity, predictable 
performance, consistent production, ordered sequence, dependable service, formal relationships, 
structured experienqes, conditioned responses,,^eliable competence— these and other similar 
characterizations make up the reality of human experiences and social relationships observed, 
measured and monitored by performance testing. No arguments are here proposed against these 
qualities and processes intensely scrutinized, promoted, and rewarded through performance 
testing. ^ 

\ But it is necessiary to question the degree to which these kinds of values, realities, and 
peliefs encompass the entire range of human experience and characterize the possible ^cope of 
human relationships. When asked, one may be tempted to respond: very slightly. But. even if the 
predictable qualities and strucJtured processes measured by performance tests char^icterize most 
human experiences and relationships, one could again ask critically: Are these ordered 
se<^uences and conditioned responses the best parts of the whole sweep of human potentiality? 
Well, probably not, nor were the elements tested in performance ever proposed to be the 
highest, most challenging and valuable aspects of humanity— though they may promote higher 
potentialities, whatever our priorities may perceive them to be. 



ERLC 



45 



53 



WILLERS 



So, again, a consideration qf !he philosophical issues m pertotmanu** i«8i»ny lt*uUb nui lo [\w 
question of whether there Is a legitimate, justifiable place for performance testlrtg. but just what 
la that place In the broader scheme of education, human development and social Interaction. 
Whether one's value orientation or philosophical pet spuctivo assigns a rolntlvoly high or low 
priority to tho roiitinirorl hohnvlornl roqiilnritioft wvaluatofi Ihrouqh performance testing, it would 
be fls difficult to judge them the best, the finest, the highest as to Judge them the worst, the least, 
the lowest. And somewhere between these two extremes, the measurable performance and the 
measurlrig performance test lie as Instrumentalities, mere means to competency, social 
usefulness, and economic Independence, but nevertheless as means to yet higher goals of 
human development and relationships. 

Performance testing, like any other rmans. may be elevated, even for the noblq^t reasons, to 
an end In itself. Perceived as such, performance testing no longer serves but defines human 
existence and experience. That life Is likely to be void of diversity and dissent, of innovation and 
Inqulsltlveness, of apontaoeity artd sparkle. It is hoped that the alternatives will not be reduced to 
a cholce between compttency, competition, and control on one hand, and creativity, compassion 
and curiosity on the other. Just as we cannot learn in a rat maze all that is most worth knowing, 
performance testing cannot evaluate all that we know and are. or should most desire to learn and 
become. 



ConPluaion 

Performance testing Is more than a fad— a mere temporary stop-gap measure for 
overwhelming perplexities that have been accumulating since World War II. Among thosa 
perplexities were: rapidly expanding school enrollments, frantic responses to Sputnik, and 
charjis that our schools were failing, then mobilization to integrate minorities and handicapped 
persona, followed «^ickly by social demands for greater equality of opportunity and the need to 
move from an expanding economy to a steady state. Perhaps at no other time in history has any 
social Institution been called upon to accomplish so much as the American school system In the 
past^eneratlon. . • 

Normally schools reflect and follow the trends of the broader society. Yet, In the past 
generation, when social goals have been unclear, educators have been called upon to mark out s 
new pathi that the broader society has. In many cases been reluctant to travel: Integration, 
conservatron. Innovation, accountability, economy, reconstruction of traditional belief patterns 
and valyie systems. Performance testing, performance contracting, and competency-based 
Instruction are but a few of the major efforts within education to respond without clear social 
goals or firm social support. No single one of these efforts, or even a combination of several,^^ 
<5ould meet all the conflicting demands and competing needs placed upon the schools. 

' A few'of the lasuea raised by performance testing have been reviewed. Ita underlying 
assumptions appear to conflict with both traditional cognitive alms and Innovative affective 
emphases. It raises questions of priority regarding Individual autonomy and social responsibility. 
It appears to contrast the mechanical with the humanistic, the quantitative with the qualitative, 
the predetermined with the free and open and unpredictable. Performance teating. In fact, brings 
tp the fore ^he biting theoretlcai issues and value conflicts plaguing education and our broader 
■^adSlety today As such. It provides a living laboratory for social and educational experimentation. 

Experimerttatlon demands caution ahd control, as well as creativity and courage. 
PerfoVmance testing aa an experimental arena Is no panacea for all educational problems. Its 
interests and capacities do not reach all^human concerns. The conceptual framework of 



PHILOSOPHICAL 
ISSUES 



performance testing is narrow and shallow compared to the breadth and depth of human 
prospects and social needs. Its concept of performance is necessarily definite and precise, and 
therefore not wholly adequate to cover the spectrum of individual interest, will, need and 
aspiration. Novorlfit)loss^ perforfnanco tosliny fms its loyitirtiato uses wilfiiii ilb dofiiiod Iiinilalions 
The danger is that these limitations will be exceeded when it is called upon to provide more than 
it has to offer. 



. - _/ 



4 



WILL£RS 



Notes 

'Thomas F. Green. "Minimal Educational Standards; A Systematic Perspective." mimeographed. 
CEMflEL. (September. Octot)er. 1977), pp. 3-20. 

'Ibfd.vP. 17. < 

'Robert Hutchins. The Learning Society (New York: The New American Library. 1968). p 115. 
Mbid . pp. 36-37. 

% 

•IWd.. pp. 124-5. 

•Joseph L. Boyd. Handbook of Performance Testing: A Practical Guide for Test Makers 
(Princeton, N.J.. Educational Testing Service, January, 1971). pp. 13. 75 

'Klngsley Price "The Sense of 'f»erformance* and Its Point." Philosophy of Education 1974: 
Proceedings of the Thirteenth Annual Meeting of the Philosophy of Education Society 
(Philosophy of Education Society, 1974). p. 21. 




PHILOSOPHICAL ISSUES: 
COMMENTS 



Comments onHhe Philosophical Issues 
In Perfomiance Testing 

John F, Thompson 
University of Wisconsin 
' Madison, Wisconsin 



The two authors present very different Ideas. Borow helps the reader learn about 
performance testing while Willers helps the reader learn of performance testing. These 
distinctions are not minor In learning about.something we learn what it is and how it functions. 
In learning of something we engage in new ways of thinking. It requires us to actively engage In 
the examination of our assumptions. 

Borow helps us understand thel?fistory of performance testing, some of its issues and 
problems. Willers. on the dther hand, takes us to basic assumptions and points out ^ 
inconsistencies with broader goals. The former, then, is more a technical paper and the latter a 
more philosophical paper. While their differences are sharp and clear they do complement each 
other. 

If philosophicaljnquiry helps us examine assumptions, what is an assumption? An 
assumption is something which is taken for granted or supposed and, therefore, cannot be 
verified in a scientific sense. If an idea can be proved. It ceases to be an assumption and 
becomes a fact. All of us act on our assumptions— ev^^l^ose that are not examined. 

Assumptions need to be examined in light of reliability. A belief is reliable when it always 
results In the same outcome. Assumptions need to be examined in light of their validity. A belief 
Is valid when it conforms to heW knowledge and experiences. And.finally, assumptions need to 
be examined in light of consistency. That is, the entire set of assumptions about a concept like 
performance testing needs to support and work together rather than against each other. 

# 

.With this frameworl^ in mind, let us examine the two papers. The strength of Borow's paper, I 
have already indicated, is that it Identifies^'some of the issues and problems of performance 
testing. Itii weakness as a philosophical paper isjhat Jt does not go far enough in examining 
many of the assumptions identified or i>^Plio<lHME^ ' ^^^^ ^^"^'^ sections of the paper to 
be more profound than the latter. Early in the p^CTlne author presents the **tacit assumptions of 
testing/' These are said to be individual differen9n^lr( people that.can be measured, the stability 
of measured individual differences, and our abi<nt^ to predict student performance from a test 
situation to an external nontest setting such as a jpb. These are powerful assertions. While all are 
not examined here they need to be by those wrto favor performance testing. 

I admire, particularly, the section on validity. There the author analyzes the assumption of 
predictability fron't^school to job. It is pointed out that 

"the failure of the typical performance test to tap relevant factors in on-the-job training 
behavior or bona fide job behavior may limit its capacity to furnish a comprehensive 
and aecurate Index of the student's competency. Performance tests customai4ly 



49 



57 



THOMPSON 



appralM an array of cognitive and paychomotor sKIMs. Yet. the affectfve domain Is 
clearly part of the universe of performance on the job Successful performance In the 
vast majority of occupations rests at least partially on the vi^orker's attlUdes and 

personal disposition toward the work scene, such ns^ride o( workmanstiip. compliance 
wjU> lules ul lUy woikpluco. quality of iiitotporiional lolntions, dopondnhility. nnd 
integrity." 

The section ends, however, with a practical and technical em(^asis on how the validity problem 
may be solved. 

Wlller's paper identifies and examines basic assumptions of perfofmanco testing II was 
necessary for me to read the introduction a coupio of times before understanding its purpose, 
which I concluded to be one of sensitizing the reader to the broad issues While I wanted to 
argue with minor points, its conclusion is the* focal point, 

"In general, these Issues speak to the limitations, narrowness and inadequacies of 
performance testing when over-stressed or under to the exclusion of other claims and 
interests. Though threatening to narrow self-interests, such critical messages of . 
caution can provide clarity and breath of purpose together with insights into worthy 
means of judging human develdpment and achievement." 

a 

It is pointed out that: 

"A performance test is presumed to be a measure of occupational competency or U^e 
the ability to perform job-related skills. This presumption, in turn, is based on the 
assumption that job skills, and even overall occupational functions, can be reduced by 
meticulous analysis to meaningful, manjpgeable and measurable sequential segments. 
The component performance of these work segments, may be examined and evaluated. 
' The purpose of performance testing, accordingly. Is to discern the quality of a 
particular job-related skill or to qualify for a particular occupation." 

I think another assumption needs' to be added to this section. We tend to assume in 
vocational education that If we know which skills are necessary for occupational competence, we 
know how to teach them. This leads to another dimension that.ls neglected fn this paper. It 
relates to the lack of assertions about learning theory as It is used to suppprt performance 
testing. 

The remainder of Willers paper Is very powerful. It Is a very concise philosophical treatise of 
performance testing. In fact. I wish [ had written it. 

In sum, while the papers are very different, there are points on which the authors tend to 
agrM. Remember, Borow'a paper is more technical. It tends to offer the position that 
performance testing is rather value-free and Its real problems are test validity and reliability. On 
the other hand, Willers tends to Identify assumptions for critical judgments. Nevertheless, they 
tend to agree that: ' ' 

• Performance testing Is a narrow e(Jucatlonal perspective. 

• Perforrnance testing does not adequately assess the impact of the affective domain on 
succeaaful job performance. - 
Performance testing haa a national perspective. , . ^ 
Performance testing has a soclai perspective. "° 
Performance testing has an inherent conflict between individual and social goals. 



50 



58 



CHAPTER THREE 



TECHNICAL ISSUES 



Thjt tBchnlcal issues affecting performance testing are either addressed dirsctly or alluded to be 
ifvery contributor in this handbook. While technical issues such as validity and reliability do cross 
an of the other issue areas, they have enough import^ce to stand on their own and warrant a 
cn^pter solely devoted to them. Therefore, Chapter Tmee begins with a discussion of technical 
oonsfdprati'ons, by Evelyn Perloff y/here validity, reliability, efficiency, test bias, and observer/rater 
variability are addressed. In discussing each of these considerations, she relates their role in 
clessfcal measurement theory, and the applicability of the concepts to performance testing. For 
example, consistency validity is described as a promising validation approach for performance 
tests. 

V 

\ 

" - - 51 

er|c ' 



Raymond Kl»hy authofd tha second paper which providas a mora pragmatic approach to 
parformanca fating. Ha tocuaaa on davaloping ot partormanca ta$ts: tasting^jprocess; 
standardization and norms: detwmining cut-on scores, pioviding tost rdlatod matorlftls. and 
rovlsing testft Thtt chaptar concfiiden with Samtwl i Ivfngitton providing a third perspactive on the 
technical issues facing performance test i rig in the Comments paper 



'1 . 



52 



60 > 



TECHNICAL 
ISSUES 



Technical Contldtratlont: Validity. Rallablllty. Elflcif ncy. and 

Obttrvar/Ratar Variability ^ 



Evelyn Perloff 
University of Pittsburgh 
Pittsburgh, Pennsylvania 

The purpose of this paperls to describe characteristics of 'effective testing Ihstruments: ' ; . 
There are three cruciah characteristics of a good test: validity, rellabtlity, and efflcl^noy. That Is, a 
good test should (1) provide information relevant to arinounced objectives or us^ to which' the 
test will be put (validity). (2) indicate consistent information about those tested (reHaj^ity). and 
(3) be convenient, pertinent, and economical to administer and interpr4V(ef'icioncy>^i (t jii' 
generally conceded by measurement experts that the most fundaineritai characterlstV ^^t? Qpbd ' 
test Is validity, with rellafelllty.generally considered secondary. Least Important arethoM! . 
additional considerations that include efficiency and a variety of characlerrstics which reflect a 
test's utility. ' ^ 

This paper discusses characteristics of a ^aood test that derive from classical measurement 
theory. Although performance testing calls for modifications of classical measurement theory, 
these revisions have been slow in coming and ^s a result much of classical m.easurement theory 
remains appropriate. There are, however, iome hopeful indications that useful changes are being 
developed for specific evaluation of performance tests. These will be preserrted here whenever 
appropriate. Two procedures of particular concern to performance testing are observing and 
rating what individuals do in test situations. The last section of this chapter will therefore present 
a brief consideration of both procedures, with special attention to the Issue of observer and rater 
variability. 

V^Udlty . ' ' 

Although vaHdIty is considerd the most imp^ortant feature of measuring instruments, it 
remains the most difficult to assess because it is the niost complex. Furthermore, validity 
involves a number of considerations that are exterrral to the test itself, yet need to be related to 
test performance. Validity has been defined in several ways, but these definitions stress the same 
general Idea: Does the test measure what It is supposed to measure? If the answer is yes, then 
the test Is considered valid, if the answer is n<^, then the test is not regarded as valid. Validity is a 
matter of degree not an "all or none" condition. That Is. two tests can be assessed as valid, but 
one may be more valid than the other because It does a better job of measuring what It is 
supposed to measure. There are also four different kinds of validity. Depending on how validity is 
defined, the four kinds afe: (1) criterion validity, (i) content validity, (3) construct validity, and (4) 
consistency validity. The first three apply to classical measurement testing and the fourth is more 
specific to performance testing. The four kinds of validity are discussed below. 



PERLOFF 



CritTion Validity 

loachOKS ai.fl in.uiagori; froijiiontly nnod to rompnrr tost nrhiovomont with school or job 
porformnnro ThHt is tests 8r« administorod boc^iuso it is nocossaiy to prodiot prosoDt or futuro 
ah.litios Tho omphnsis horo is not Of1 what the tost moasiiros, hut latlmi on how woll il piod.^l:,. 
that is the quality of the test is not determined by the test s content pe. se. but rathoi with tho 
ability of performance of that content to" predict later school achievement or job performance. If 
subsequent school or job expectations, based on earlier test performance, are confirmed then 
the test has criterion validity Criterion validity has also boon called criterion rolotod or 
oonciirront find predictive validity ' • 

I 

Criterion validity is so termed because it relates to a criterion (standard) or rule (or |udq»ng 
the value of something. In measurement, a criterion is performance (academic grades, 
supervisory ratings, job proficiency) against which the value of a test score is judged. Thus, a 
test has criterion validity if individuals who are judged successful on the criterion (cfo well In 
school obtain high job ratings, perform effectively on-the-job) are those who also obtained the 
high test scores. Similarly, we would ej^pect individuals who are judged unsuccessful on the 
criterion (do poorly In school, obtain low job ratings, perform Inadequately on-the-job) to be 
ttiose who obtained the low test scores. In contrast, a test does not possess criterion validity if 
there is little agreement in how individuals perform on the criterion and how the test assesses 
their abilities That is. highbr test scores correspond to a range (low and high) of school br Job 
measures and low test scores correspond to a range (low and high) of school or job measures. 

Criterion validity presupposes that a criterion is relevant and has been accurately measured. 
That is not any criterion will do. A criterion must be salient for those who wish to mak^ 
personnel decisions on the basis of tes^ scores. For example, grades are viewed as a salient 
school criterion but number of hours studied or ability to outline material effectively, although 
worthwhile and perhap9 means to an end for grades, may not in themselves be considered good 
criterion measures. Obviously, selecting a criterion is no easy task since the complex and 
difficult Issues inherent in the concept of validity are true for criteria as well as for tests. This 
predicament Is readily observed for the two most commonly used (and supposedly most 
appropriate) criteria: school grades and on-the-job performance ratings. Unfortunately too ittle 
effort is expended on the criterion side of the ledger. We suspect that until this state of affairs is 
modified criterion validity may not accurately reflect an instrument'? effectiveness. 



Content Validity 

Judging the adequacy of a test s substance or content describes the p^rdcess of content 
validation It seeks to answer the question: Does the test measure what the test constructor 
(teacher or manager) thinks it does? Judgment in this context generally refers to evaluation by 
experts in a content area. 

Content validity is typically applied to tests measuring outcomes of education and training. 
For the most part, these tests are achievement tests or representative samples of the universe of 
appropriate content. The process of content validation specifies clearly defined steps to ensure 
that the final product, the test, has maximum content validity. A first step involves relating 
Instructional content on the one hand to a taxonomy of objectives on the other. This step 
encourages delineation of expected instructional outcomes as well as detailed student Uehaviprs. 
It resembles preparation of an efficient lesson plan. 



54 



6S 



TECHNICAL 
ISSUES 



following this, appropriate measures of the expected instructional outcomos and student 
b*havior can be developed. This second step requires representative sampling of the test's 
content. The completed test is then rondy to bo judged by approprinto experts in tho aroa 
(ugardinu aduqualo covocayo ol il^^ conlodl. II tho uxpuitb coocui. Uio tobt ib conbiJo(oJ to ttavu 
content validity. If tho exports are critical and disagroo. tho tost will nol bo assossod as havuig 
content, validity. 

Although content validation can progress in an orderly fashion, execution of the primary 
steps— subject matter selection, outcome specification, content sampling, and ultimate judgment 
by expertt^^tends to be highly subjective It appears unrealistic, then, to expect constant close 
correspondence between what a test author includes in a test and how that test is judged by 
experts in the field. Unfortunately, in many situations, there may be no other alternative than 
content validity as a measure of a test's effectiveness. Lennon states it well when he says that 

in many testing situations (of which achievement testing forms the largest class) there 
is not available or readily accessible any dependable criterion variable, against which 
the "validity" of the test may be measured; and secondly, is the fact that there are 
certain uses of tests for which correlations with either contemporary or subsequent 
criteria are not meaningful as indicators of validity.' 

It Is probably with regard to content validity that performance tests fare best. Their contents 
appear to resemble the objectives and contents established by a curriculum and are therefore 
readily acceptable to educators and job trainers. In fact, Borow points out that when 
performance tests have 

highly relevant content they are so compellingly convincing in appearance that 
vocational educators, on-the-job (OJT or JITMraining supervisors, and industrial 
pera<9r)n^iN^ruitment officers are tempted to accept derived scores from such 
^rtdmmooe tests as tantamount to job proficiency/' 

A final Issue regarding content validity that pertains specifically to performance tests as they 
relate to minimum competency testing is to view content validity in terms of curricular and 
instructional validity.' Curricular validity determines how well a test measures a curriculum's 
objectives. This Involves a comparison of test and curriculum obj^tives. Instructional validif^^^^ 
measures whether the schools provided the content assessed by the test. Both curricular and i 
instructional validity place additional burdens on tests that are beyond that generally demanded 
by content validity in measuring student and employee performance. 



Construct Validity 

J 

Whenever it jls necessary to consider one or more underlying properties or constructs 
(concepts) thatUn instrument measures, then the relevant validation procedure called for is 
construct validity. It is an analysis of the meaning of test scores in terms of psychological 
concepts or "constructs". This kind of validity is considered the most significant and important 
because It derives directly from theory. Unlike criterion and content validities, the process of 
construct validity is not easy to vinderstynd. It Is intricately linked to science and is the same 
process as that used to generate^nd test scientific theories. 



ERIC 



55 



63 



PERLOFF 



There are various typea of evidence that can be considered in estaohstMng coMbi.uoi val.d.ly. 
The term "conatruct" refers to an underlying trait, disposition, or ability, such aa anxiety, 
conoenlellty. motivation, responslbllty. or verbal Influence. These are .^^^f'^^^ 
largrnumber of possible constructs, There are two pr.mary ways to obtai.i ovidonco of ^;Onltri.ct 
valKiay. ri.5t. a lost has construct vnlldlty if it differentiRte^ botw«en individuals who rank high 
and those who rank low on the constri.ct underlymg the test (Note »;^«t in this case the 
construct Is In fact a criterion measure ) Second, a test has construct validity If the tf\eory 
proposes certalri modifications of the censtruct and these In turn produce ^orrespond ng test 
scoHB changes, ilost frequently construct validity Is accompllahed by examining a group of teats 
beneved to'be rJ^Ing the same thli^g. Then, the characteristic underlying ^^'^ 
these tests (a coKruct) is identified bV using a statistical procoduro called factor analysis The 
technique of factor analysis permits reduction of a complex domain of many variables to one of 
simpler structure with fewer variables. This analytic procedure Identifies tests or measures that 
are closely related (highly correlated) v^lth one another. That Is. these tests or measures are 
similar, they belpng together. The reduc0d number of characteristics or ^ar ab es underlying 
groups of similar tests or measures are then called factors or constructs. It Is '^P0^«" »° . . 
rem^ber that the construct Identified will depend on the specific tests and measures Included 
nThTfactor analysis. According to Ekstrom.^ there are a number of problems «"«<^ ""O «>"»»;"^» 
validity when factors of a factor analysis are used as criteria. The first problem results when a 
TuSr o?!ests Identlfled-by the same construct are actually rneasurir.g ~^»^^'Xemar 
characteristics of examinees affect the factor structure of a test. That la. definitions of menta 
heS th dmer by se,i and race. For example. If males and females exhibit the sarne behavior I 
"may be rated as highly aggressive for the female but only moderately aggressive tor the rnale. 
sTr^Harly. some personality measures are affected by race "because nonpathologloal racial 
variance contributes to elevated scores on some scales."" 

' A third and last concern relates to examinees' use of different strategies to,»o'/« P^^^^'^'"* 
presented In testa. For example. It has been demonstrated that although^many Individuals 
mXl ^manipulate figures in solving spatial viauallzatlon problems, others use an analytic 
ratCtoreparate fijuces ir.to elements and then took for almilarltiea. Similarly, according to 
Qruen and Parkman.^ most adults use memory to solve problems of slmpH addjiion. but 
most children and some adults use Incremental counting to solve these problems. 

Construct validation is obviously a much more complex and time-conauming Pi-ocess m^^^ 
either criterion or content validity. As described /by Cronbach- "construct valid ty Is establhihed 
throuah a lona-contlnued Interplay between observation, reasoning, and Imagination ; ana 
accoXg to K^rtlnger- It has been "recognized' as a central kind of validity" by the Arnerlcan 
PsychoKilcal Association. In summary, construct validity appears as the P^o'"'''"^^ 
validation procedure, worthy of the necessary time, effort, and expense required to Identify as 
well aa measure the relevant construct. 



ERIC 



ConsiatBficy Validity 

The previous discussion haa presented the classical model used to establlah test validity. 
That is as pointed out by Wernlmont and Campbell,'" the classic validUy model uaea teats aa 
lion, ir ndlcators," Instead of aampllng appropriate behaviors to predict future Pe^^or™^^^ 
mSny writer, have pointed out. particularly thoee who have -J'^^^^^^^ 
in tryino to predict on-the-job performance, the classical model has no always been •"•ctlva 
TheVe IS substantial evidence to indicate that validities for many predictors (measures of mental 
IJ^Sy. s^scTand general aptitude measures, achievement tests. Interests, or Personality 
dimensions) of job performance have remained low. In fact, these conditions have persisted for 



56 

64 



TECHNICAL 

ISSUES 

2 D -^•^^^ 

ov«r 50 yiars In spite of extensive efforts by professionals in government, industry, and the 
military to ameliorate this state of affairs. Hopefully, we are finally ready for 'an idea whose time" 
hat long since passed. What is being proposed, then.^s to modify criterion validity as it is now 
defined to stress consistency between (fritoria and predictors. Or as Wornimont and Campbell 
statu: 

The essence df the suggested procedure is the establishment of consistencies between 
relevant dimensions of job-behavior and preemployment-behavior samples obtained 
from real or simulated situations. If samples instead of signs are employed, a number 
of prediction and measurement problems seem to be alleviated or it least confronted 
more directly." 

In other words, the shift in criterion validity that is being suggested is from predictors as 
signs to t}ehavior as samples of future performance. Wernimont and Campbell describe it well 
when they say, "The best indicator of future performance is past performance."^^ 

A related issue here is a tendency by those in measurement to refer to any relationship 
between similar behavioral measures as reliability rather than validity. Classic measurement 
theory defines validity as the correlation between dissimilar predictors and criteria. In contrast, 
consistency validity looks to relationships between similar predictors and criteria. This latter 
notion of behavior sampling appears to be the basis of a large domain of performance 
assessment: namejy, simulation. Wernimont and Campbell also point out that the approach 
seems to underlie prediction fronrv biog'raphicahinventories that include items that 'represent an « 
attehipt to assess previous achievepient on similar types of activities/*'^ 

Four ppsibie^steps constitute application of the consistency model The first steps entails an 
extensive job analysis, wfth specific attention to those job dimwsions which relate to critical 
behaviors for successful and/or unsuccessful job performancy! Second, each applicant's 
background (education and experience) is assessed for manifest critical behaviors. Step 3 
follows whenever an applicant's background data do not include relevant job behaviors. This step 
requires administration of numerous work-sample tests and/ or simulation activities. The fourth 
and final step involves use of "individual performance measures of psychological variables"^^ 
whenever possible. 

A final issue involved in copsistency validity is that predicted measures must not only be 
beh&vioral measures but also observable job behaviors that relate to performance competency. 
Behavioral measures of the performance of, say, a production rpanager woutd refer to 
assessment of such job activities as scheduling requirements, operating costs, spillage and 
waste, employee absenteeism and tardiness, procurement, and future planning. These become 
the predictor measures (or behavior sample) and must be similar to and therefore predictive of 
the criterion (or measures to be predicted). It follows then that such frequently adopted criteria 
as salary increases and promotions are inappropriate. Neither criterion can be considered a job 
behavior nor can the individual exert significant control over either of them. 

Som0 AdvMtag9S. Although consistency validation is not a total panacea for problems 
associated with criterion validity, it can provide better returns in seeking to understand job 
performance by. stressing behavior measurement. As suggested by Wernimont and Campbell, 
there are four primary advantages that consistency validity' has over criterion validity. These are 
presented below. 



57 



PFRLOFF 



1 Stability of relovant iOh behaviors In spito of productivo research rolnting to performance 
criteria there appears to be little Information dociimonting stability of tolovant )ob behaviors It 
follows thod that consistency validity wfm.h sfmaKs to i.u;uMiiu, ami roluv.uit |ol) Imiiaviof r,, is n 
n.o.o appl.cabU: validation app'onrh fh;in rlnsn.rn! mPthor1f>lo.,v Mltom,>ls 'to c,enenni/o (lom a 
one timo ci.totion nioasuro to an appionciablo tinu^ span of |()b beliaviof That is. imliko 
criterion validity, consistency validity stresses longitudinal prediction 

2 Faking and response sets. Since consistency validation maximizes behavior and minimizes 
self-reporting, the usual response biases that affect self-roports will be significantly roduced 

3 Discrimination in (ostinq As pointed out by Ooppelt and Bennett," two common ciiticisins 
made against tests are (1) laclt of relevance and (2) unfairness of content Hoth ctiarges have tiad 
deleterious consequences on testing programs, particularly in business and industry. Thus, a 
number of legal cases have shown ttiat many job skills and knowledge can be obtained through 
on-the-job training programs, regardless of test performance. Similarly, many tests have been 
considered "culture-dependent " Test items strej^ white middle-dass values that result in an 
inaccurate appraisal of the disadvantaged or those who have not been influenced by white 
middle-class culture or education 

4 Invasion of privacy. This is the fourth^nd final problem that the consistency validity . 
approach dissipates. That is. there is neither need to develop new tests each year nor 
maintenance of strict security over testing materials by test developers. The tests, by 
specification and design, are to resemble job behaviors. Thus, these behavior samples, by the.r 
very nature, serves as obvious links between preemployment and on-the-job behaviors. 

Consistency validity appears to be a promising validation approach. It is suggested as a 
replacement.for criterion validity only (not for construct validity), and it is particularly 
appropriate for performance tests because it focuses on the measurement of behavior. That is. 
consistency validity substitutes behavior samples for predispositional signs, stresses longitudina 
over one-time criterion measurement, and can significantly reduce persistent testing problems of 
response sets, discrimination, and invasion of privacy. 




Reliability 

Reliability Cronbach perfers the term generalizabilty.'^ is the second most important 
characteristic to consider in evaluating measuring instruments. A variety of terms haVe been -used 
to define reliability. They include accuracy agreement, consistency, dependability, generalizabii- 
itv homogeneity, precision, regularity, stability; and trustworthiness. Of these termS, consistency 
is probably considered most representative, although not totally encompassing. Consistency here 
refers to stability or trustworthiness of test performance over time. Unfortunately, reliability 
measurement involves an indirect and statistical conceptualization. Thus, it is assumed that a 
"true score" exists on a particular test^or every individual, but these scores are indeterminate. 
They could, however, be approximated if the test were administered many times to the same 
individual Not only is this unreasonable but it is also unrealistic since a test is usually 
administered only once. Hence, reliability is interpreted as that proportion of the variance 
attributed to variation in the "true sense." 

Estimation is essential here because behavior fluctuates, with the result that performance 
varies from one time to the next. Furthermore, no single measurement can be expected to typify 



58 



TECHNICAL 
ISSUES 



an individuars behavior completely; It can only serve as a rough approximation. Test theory 
provides techniques for assessing this variability of test scores In order to estimate "true*' 
performance. The most familiar estimation approach is to determine the standard error of 
measurement that provides an Indication ot the magnitude of error between "true' and observed 
performance. 

There ar» additional approaches for assesaing reliability, where comparisons require making 
at least two observations per person. The emphasis here is on consistency or lack of error. That 
is, a reliable test is one that is devoid of error, where error refers to test score Inconsistencies 
resulting from a variety of Influences and conditions that plague measurement These errors are 
random or chan<;e fluctuations that do not result from changes due to the nature of what is t>elng 
measured, bu^ may result instead from variability on the part of the test taker, due to such factors 
as fatique, low or high motivation, and variability in the interpretation of ambiguous test 
questlorw. 

There are two c(^mparisons for checking consistency: (1) administering equivalent parts or 
complete teata on the same occasion and (2) administering the same test on several occasions. 
The former approach (measuring on one occasion) indicates how well two sets of comparable 
test scores agr^ ^hen they have been obtained at the same time. The latter approach 
(measuring on, several occasions) compares agreement of two or more aets of test scores when 
they have beeri obtained at different times. Both approaches examine the four major sources of 
test-score variation that Mfect reliability. These have been sgccinctly specified by Cronbach^* as 
four kinds of charcteristics that influence an individual's performance: (1) lasting and general 
characteriatiqs, (2) lasting and specific characteristics, (3) temporary and general characteristics, 
and (4) temporary and specific characteristics. 



MBBsuring on One Occasion 

Two methods are available for determining reliability in this situation: alternate form 
(administering equivalent forms of a test) and internal consistency (dividing a single test into 
equivalent parts). The tw6 major sources of test-score variation that are cognted as error here 
and hence reduce reliability include both lasting and temporary specific characteristics of the 
individual. These specific characteristics are appropriately illustrated by (1) lasting skills, 
abilitiee, attitudes, and knowledge called for by the particular test, and (2) temporary memory 
fluctuations, motivational changes, luck, and emotional states related to the particular test. 



MBBSurlng on Sbv^bI OcCBsions 

Two methods are also available for determining reliability over time: retest (administering the 
.same test after an appropriate time interval) and delayed alternative-forms (administering 
equivalent forms of the test after an appropriate time interval). The exact length of the interval is 
not of major concern, only that it be long enough to minimize effects of memory. Jhe two major 
sources of test-score variation that count as error in this case are general and specific temporary 
characteristics of the individual. These temporary characteristics are fittingly illustrated by (1) 
general knowledget akllla. atmudea, and habits^related to the particular test, and (2) specific 
memory fluctuations, motivation changes, luck^and emotional states related to the particular 
test. ^ . / I 



50 



/ 



PERLOFF 



Although validity is considered the single most essential requirement of a good test, 
reliability helps to ensure a test's trustworthiness nnd dependnbllity Cronbach sums It up wo! 

Intormution on rollabiiity is supplomontary It somntimos warns us that validity will bo 
limitofi just because ot error ol rnoa^»lllonu3^t. aiul il i.utnolum!;, \n'.\^K, u:. I'Lin ^^hp 
accurate data-gathering procedure 



ttficiency 

The third charnctoristir to t^o dosrrod in tosts is efficiency Ihrs cbaracterrstic refers to a 
number of supplementary considerations that inchrde sources of tost bias, "face validity, 
applicability and cost. Although none of those is as conceptually critical as validity, they do 
relate to the'tesfs effectiveness and should be examined as part of a test selection procedure 



Sources of Test Bias ^ 

AS discussed by Ekslrom.-i^sts should bo as free as possible from different types of content 
bias. These biases are m numerical. (2) role. (3) status. (4) stereotypic, and (5) familiarity. The 
first four biases result^n members of certain groups are underrepresented or overrepresented 
by number level, kind, and stereotype of activities in which they are portrayed in tests. The fUth 
and final bias, familiarity, results when ^rtairVgroups have had differential opportunities for 
experience or familiarization with specific test content. 

As Ekstrom" points out. these biases have been well documented in the literature. Numerical 
bias has frequently occurred because women are Infrequently presented in achievement tests. In 
contrast, role bias has been frequently found in test cbntent because wom^n are generally 
portrayed as housewives, secretaries, and teachers, suggesting that womerl do not (or canno 
enter all occupatibns. Similar to role bias is status bias where women and minorities are rarely 

d in administrative and leadership positions. That is. they are teachers and salespersons 
but not principals and managers. Stereotypic bias results when tests show (1) women as less 
interested or able to worK with mechanical equipment, preferring Instead to work in homemakmg 
and helping areas; and (2) minorities as less interested or able to handle the professions, 
preferring instead to remain as blue-collar workers. 

The fifth and last bias, familiarity bias, is best illustrated by Ekstrom when she describes a 
spatial visualization test "in which the subjects were told that the P^°^«f 
problems is similar to 'working with sheet metal". Suchll itatement probably biased this test In 
tawoT of males because it suggested that these items can only be solved by people who have 
some knowledge of arteet m6tal work|rhe identical process could have just as accurately been 
described as similar lo 'working|^i^ress pattern."" 

It is a sad commentary, indeed. tJ point out that not only do these biases affect 
performance, but also that, there is little, if any. research data to substantiate or refute them. 



Face Validity 



This consideration refers to the nontechnical issue of consumer appeal. That Is. public 
acgeptance of a t6st generally demands that it appear relevan^ and meaningful. A test that 



TECHNICAL 
ISSUES 



appears appropriate and reasonable to those examined is said -to have 'face validity /' Although 
no Quantitative assep«ment can be made of face validity. Cronbach^^ states, "if a test is 

interesting and 'sensible/ taking it is likely to be a pleasant expofiencu/' and triis piobably 
produces valid scores 

Scores in this case pertain to the specific, behaviors of the test and as such indicate the test's 
purposes. It is Important to consider two questions; Does the test measure what it is assumed to ^ 
measure? Does it adequately sample the appropriate content? In most cases, tests measure what 
they appear tq be measuring, but there have been occasions when this has not been so. That is. 
so-called clerical aptitude tests with subtests seeking to measure numerical, equipment 
identification, and information abilities have been found to be predictive of mechanical aptitude 
Thus, as Selltiz, Wrightsman, and Cook^^ caution: "just looking' is smug ignorance/' Technical 
validity (as previously presented) should not. of course, be sacrificed for face validity and this is 
not necessary because tests that have bo^p technical and face validity are usually available. 

Applicability 

A third measure of efficiency ^s ttie ease with which a test can be admloistered. scored, and 
interpreted. A test is easy to administer if it does not require highly trained person^ to administer 
it Similarly, a test that does not have either complex or specifically timed ilnstructions will be 
easier to admnister. A test that can be objectively scored will be easier to handle than a test that 
requires judgment pf observation. And finally, a test that can be readily interpreted and 
communica^ by prepared check lists or tables is easier than ^ test that requires professional 
expertise for interpretability and communicability. 

cCosf . : 

The last consideration of efficiency is cost of test materials, administration, and scoring. 
Costs can be reduced when it is possible to reuse test materials. If a large number pf individuals 
are to be tested, it may be more economical to obtain a full-service package from the test 

publisher that covers test materials, scoring services, and reports of individual and group results. 

o - • • • • 

^ «» 

Summary ^ ' ' . ' ' 4- 

In suri^niary. a test is efficient when it is unbiased, acceptable (has face validity). 4>pli<>c'ble 
(easy to ^minister, score, and interpret), and economical, decisions regardfng tests must 
Initially consider relevance and consistency of information. For this assurance, we turn to \^alidity 
and reliablity. A final, but not necessarily insignificant coiisideration.^ is test efficiency. Certify, 
If validity and reliability of two tests are about the same, the decision regarding which test ^pjie 
should be based on matters of efficiency. 



Observer/ Rater Variability 



) 



Plcusaion thus far has concentrated on issues from classical measurement theory— validity, 
reliability, and efficiency— that affecf testing. As pointed out by t<leln,2« however, a perforniance 
test "Involves observing and rating what individuals working at specific jobs, in a variety lof 
situations and conditions actually "do." ^s a i^esult. developers of performance testsjace . 



PERLOFF 



particularly difficult problems that do not generally confront those who design the more typical 
achievement and aptitude tests We refer here to obstacles Inherent in the processes of observing 
and rating. Wlore specifically, we will limit discussion to validity and rolinbility issues of 
"observation and rating measures, pfeseuliug observatiou issuos liisl. 



Observation Issues 



Variability among observers arises primarily from two sources of error; variability within 
individual observers and a variety of systematic observer biases As presented by Simon.-' . 
observer variability means "the inability of a given observer to repeat an observation again and 
again In exactly the same way with exactly the same result; ^nd bias means "a tendency to 
observe the phenomenon In a manner that differs from' the true" observation In some consistent 
fashion." 



» V 



Overcoming observer \jias is not ah easy task. Biases appear to creep in regardless Qf how 
much care js exercised. Ideally, then, the task "is to determine each observer's bias and 9II0W for 
lt."» Since this is highly unlikely, a more realistic approach is to use a nurnbe/ of tactics . 
, specifically developed to decrease variability within observers which, in turn, also reduces 
variability fromf bias among observers. ^ 

Six corohfbn tactics suggested by Simon" that have been found helpful includ^(l) sufficient 
tralf><ng of observers, detailed spebification of tasks that observers >are asked to^wrform. (3) 
provision pf specific written instructions for constant consultation by, ol?servers. (4) reporting 
information as soon after observation as possible. (5) use of mechanical devices whenever 
appropriate, aria (6) obtaining information from several observers who observe at the same tim^e. 
As pointed out by Simon.*' these tactics "rpduce the area of discretion within which bias may 
operate" by (1) providing carefully (Jetailed protocols fo^ observers to follow, (2) ,dlscpuragkia 
inferences from observes, .and (3) stressing techniques that minimize forgetting and Inaccuracy. 
' Thus, ^uletf' has advlq^d "that a stubby pencil and a small. battered notebook make people less 
nervous than do more pretentious tools." 

Observer paliablllty is concerned with interobserver agreement as well as with the agreement 
of Individual observers over time. It Is, however, usually defined as "the degree to which two or 
more observers agree on their bb8ervations."=« There appears to be no consensus on a single 
formula to use in determining observer judgments, but a common method is to divide number of 
, agreements by the 'sum of number of agreements plus number of disagreements.'? ^ 

Atcording to Selltlz, Wrightsman, and Cook." this formula demands a brief pbservatlon time 
tbT0n8ure that observers code the same unit of behavior. The forntula gives overly high reliability 
^Wues when percentage agreements are compared with chance levels, and there are t^ere are 
high base percentages and few categories. 

' • " •• 

Bating l^^s . ' ^ 

Performance testing also frequently requires ratings of learning or work activities./ or 
example, to measure learnfng obtained |ri a short-term library experience, "we can complete a 



62 

. 70 



I tCHNICAL 
ISSUES 



raMng scale to assess [his/her] learning, using such criterla;.as relationship to patrons, accuVacy 
oMrtformatiOF) provided, cooperatlveness and attitude.***'' 

Untortunately, a variety of systematic o(;ors aio also piosoDt in ratings For tfio most part, 
these errors result from rater biases Jhree common systematic errors inoUicle rialo etlBct, 
generosity error, and contrast error. 

Halo effect results when raters generalize their impressions frorp one rating to another. That 
is, they seek to achieve consistency or what Newcomb has called "a logical error; that Is, jgdges 
often give similar ratings on traits that seem to them to be logically related/'^^ Generosity error 
occurs when rat0rs overestimate positive qualities of individuals whom they like Similarly, raters 
appear'to Judge individuals as belonging to middle categories rather than assigning them to the 
extremes. According to Murray, contrast error results because of 'a tendency on the part of 
raters to see others as opposite to themselves in a trait. 

Tt>ere are, in addition, a number of sociocognitive biaseS that can be expected to affect 
ratings. Thus, raters m&y 

attach excessive weight to informa(tion that is highly concrete,salient, and easy to 
remember . . . may be prone to overestimate the extent to which behavior is caused by 
stcfble personality factors, while minimizing the .impact of situational and environmental 
forces on individual's behavior . . . and, because people are unawfire of fundamental 
statistical principles, they are susceptable to biases in judgment.^ 

; » 9 

These biases include only a portion of those that can influence judgment. Both validity and 
reliability ar,e reduced not only by systematic and random errors, but also by the many 
sociocognitive biases that may occur. Unreliability of ratings among rater? frequently results 
from "the fact that some frame of reference ^s implicit in any rating; different raters may use 
different frames of reference in desiiribing individuals in terms of the characteristics in 
question."^ . . ^ 

W is fbrtunate therefore that a variety of ways exist for reducing errors and^lfhses, Although 
it is not possible to list the many techniques for minimizing these Influences, we offer Several 
ways to^prove rater accQrat:y, in addition to those listed for overcoming observer variability. 
For example, one suggestion to reduce the constant errors described previously is not to use. 
extreme rating scale positions such as: The student always uses prope^ lighting injtaking 
photographs. A preferable <less extreme) statement would be; The student generally uses proper 
lighting in taking photographs. Similarly, the use of neutral descriptive scale positions instead of 
Qvaluaiive ones are likely to reduce generosity error. Biases can be avoided by adopting a 
sdentific approach and maintaining awarene/^s "of th& fallibility of judgmental processes.'*^^ 

• » 
Summary ^ ' " • 

' In summary, a technical discussion of performance tests should include issues of observer 
|ind rater variability io addition to the classical measurement processes of validity, reliability, and 
e^lciency. Although observer variability is not easy to control, a number of tactics can be^ 
adopted to reduce variability within observers which, in turn, also reduces variability among 
observers. Ratings geherally include a variety of systematic errors and sociocognitive biases that 
affect both vai|.dity and reliability. As with observer variability, rating errors and biases can be 
significantly minimized by adopting a number of sirriilar techniques, the primary one of which 



PERLOFF 



stresses a scientific approach. It is apparent therefore that overcoming biases and errors is 
difficult and regardless of how much care Is exercised, they appear to creep in. The best 
solution-to theso probloms seoms to bo to use the variety of toct.cs specmlly developed lo 
decrease vaiiabihly and incioabo validity and loliubilily 



l5 



7s 



TECHNICAL 
ISSUES 



Notes 



*Roger T, Lennon. Assumptions Underlying the Use of Content Validity/' Educational and 
Psychological Measurement, 1956, p, 297. 

'Henry Borow. "Philosophical, Prnctlcal. and Tochnlcnl Issues Pertaining to Performance Testing 
in Vocational Education," (Columbus, Ohio NaTlotial Center for Research in Vocational 
Education, 1979j^p. 18. 

^Stephen J. Slater, "Applied Performance Testing; Typology, Advantages, Limitations, and 
Examples," (Portland. Oregon: Clearinghouse for Applied Performance Testing, Northwest 
Regional Educational Laboratory. 1978). p. 26. 

*Ruth B. Ekstrom, "Issues of Test Bias and Validity/' Paper presented at Symposium, Revising 
the 1974 APA Test Standards. American Psychological Association Annual fileeting. New York 
City, September 1979), pp. 10-11. 

* * * 

»lbld. 
•Ibid. 

'Ibid., p. 12. ' ' 

"Lee J. Cronbach, Essentials of Psychological Testing (New York: Harper & Row. 1970). p. -142. 

•Fred N. Kerllnger. Behavioral Research: A Conceptual Approach (New York: Holt. RInehart and 
Winston. 1979). p. 141. . 

'"Paul F. Wernlmont and John P. Campbell, "Signs, Samples, and Criteria," Journal of Applied 
Psychology. 52 (May 1968): pp. 372-76. 

"Ibid. 

•'Ibid. ' 

"Ibid., p. 373. ' 

'-Ibid., p. 374. ' ^- , 

"Ibid., p. 375. 

'•Ibid. ^ 

^Mohn P. Doppelt and Qeorge K. Benneft, 'Testing Job Applicants from Disadvantaged Groups." 
Teat Service Bulletin, No. 57 (New York': Psychological Corporation. 1967). pp. 1-5. 

'•Cronbach, Essentials of Psychology Testing, p. 154. 



ERIC 



66 

73 




PFRLOFF 



'•Ibid., p 175. 



'"Ibid,, p 182 



"fckstrom. lest Bias and Validity, p I 



"Ibid., pp. 2-3. 



"Ibid,, p. 4. 

"Cronbach, Essentials of Psychological Testing, p 182 

2»C. Selltiz. L. S. Wrightaman, and S. W. Cook. ResBarch Methods in Social Relations. 3d ed. 
(New York: Holt. Rinehart & Winston, 1976). p. 1^9. 

^'Raymond. S. Klein. "Some Selected Technical Issues Related to Performance Testing." In 
Performance Testing: Issues Facing Vocational Education. (Columbus, Ohio: National Center for 
Research in Vocational Education. 1980). 

"Julian L. Simon. Basic (Research Methods in Social Sdlence: the Art of Empirical Investigation, 
Cd ed. (New York: Random House. 1978). p. 273. n 

^•Ibld.. p. 276. 

^•Ibld.. pp. 278 79. 



»lbld., p. 278. 

"J. E. Hulett, "Interviewing in Social Research: Basic Problems of the First Field Trip," Social 
Forces 16 (March 1938): p. 364. 

"Selltiz, Wrightsman, and Cook, Research Methods, p. 287. 
"Ibid. 

«lbid.. p. 288. 

»«S. Malak, J.E. Splrer, and B. Land, Assessing Experiential Learning in Career Education, 
(Columbus, OqjK): National Center for Research in Vocational Education, 1979). 

"Selltiz, WrlghtaJhan, and Cook, Research Methods, p. 287. 

"Ibid. " , 

"R. M. Perloff, V. R. Padge^nd T. C. Brock. "Socio-cognitivp Biases in the Evaluation 
• Process," In; EfWcs, Vaues.'and Standards In Program Evaluation, edWed By Robert Perloff and 
Evelyn Perloff ($an Franclsc6: Jossey-Bass, in press). 

"Selltiz, Wrightsman, and Cook, Researc/i Mof/iods, p. 409. 

*»Petloff, Padgett, and Brock, "Socio-cognitive Biases in." p. 25. ^ 



66 



ERIC 




TECHNICAL 
ISSUES 



ERIC 



/ 8om« 8«l«ct«d Technical ItiuM R*laUd 

to P«rformanc« Tasting 

r Raymond Kloiii 

National Occupational Compotoncy 
Testing institute (NOCTI) 
Albany/ New York 

D9VBk)plng Performefnco Tests 

' The current emphasis in performance testing if to develop measures of direct assessment of 
skill attainment. In order to to this, a candidate is asked to perform a series of tasks based on 
actual jobs that have been judged critical in relation to demands of a specific occupation. In this 
context, "critical" means the demonstration of skills considered essential to perform adequately 
in a specific occupation. In order to be able to construct valid performance tests, the test 
specialist needs to obtain a timely occupational analysis from which the critical competencies 
and tasks may be determined. Once these critical competencies have been uncovered, they 
should be ranked in order of the frequency in which they occur, as well as their relative 
ifT)portance in the job. In this fashion, a list of critical competencies may be idehtified. 
Essentially, these key competencies set one role apart from another by identifying the element? 
that give the occupation its uniqueness. (See Table 1.) 

Unlike teacher-prepared examinations that can be put together after identification of the 
objectives of a unit ot instruction, a performance test designed to measure occupational > 
competency reguires more extensive efforts to construct.' Conducting an occupational analysis 
involves observing what individuals working at specific jobs in a variety of situatibns and 
conclltions actually do. Out of this observational data, aScategorization of the^occupiitional 
competencies must be made. The categorization provides the developer with a distribution of 9 .^ 
variety of Jtasksjnto divisions, each division representing more or less a unique major factor erf * 
the particular occupation. 

Each major division then has^ be broken down into its respective subdivisions, thereby 
grouping all subtasks into an orderly structure. After the information collected has been so 
categc^zed, it should be reviewed by knowledgeable people in the field to confirm the validity of 
the breakout. Having 'obtained a me^sqre of c^ntinsus from knowledgeable individuals 
regarding tha competencies that comprise specific occupations, it is then necessaty to i 
reorganize the specific tasks ih a hierarchical manner so that the least critical competency is 
placed on the bottom of the list and the most Sophisticated understanding appears at the top of 
the list. The competency with the highest point total (frequency X importance) would appear 
first, the other competencies would be placed in^ descending rank order. Once this is 
accomplished, it is necessary to identify examples of jobs or tasks based on these actual 
job-related competencies, and to consider them for inclusion in a performance examination. 



67 

7.1 



KL fIN 



Tabl» t 

A Model for Determining Actual Competencies 

1. Identification of major divisions of the occupation 

2. Identification of subdivisions of each major division 

3. Identification of competencies tquired for each subdivision 

4. Identification of critical competencies 

Critical Competencies = Frequency of Use x Importance to Job 
> a. F)r«c|u«ncy may be pealed: ^ 
Criteria Weighted Value 

Ver^ frequent , (5) . 




Frequent 

I. 

Average 

Occasionally 

Rare 

b.-lmportanc* may be scaled: 

Criteria 

Critical 

Essential 

Importance 

Needed 

Desired 



(4)' 

(3) 

(2> 

» 

(1) 



Weighted Value 

(5) 
(4) 
(3) 
" (2) 

.■» 

(1) 



Note: The Jobs and tasks selected for Inclusion In the performance test should measure an array 
of the critltial competencies both directly and subsumed. ' 



68 



76 



TFCHNICAL 
ISSUES 



In the past, because of the cosit and time required for such undertakings, comprehensive 
catalogs related to specific occupational competencies were rarely assembled In recent times, 
with states pooling their resourcos, organi/atiuns. such as V TCCvS and Iho Ohio Insliuctioiial 
Mntrrinls I nboratory, hnvn bf>pn romf^ilinq (^atnlogs of (^(nr>fi<>!on( los rolat^^d to thn spe^fu^ 
occupations Those vonturos havo. in turn tioon translat<Kl by various stato dopartmonts of 
«Oucation into curricula aimed at (developing the specific critical skills. These same analyses that 
are used to identify occupational sKills and knowledge are also helpful to test developers for 
selecting those competencies that heed to be assessed by means of a performance examination. 



Ma/or Steps 

Specific jobs or tasks have to be determined based on the critical competencies that were 
identified. These Jobs or tasks can' then be used as the vehicif to assess ^kills.^ To develop the 
test, it is advisable to bring together a committee of practitioners and teachers of the occupation. 
This committee is used to identify the jobs and tasks that will be required to test a candidate's 
understanding of the critical competencies needed in the work setting This can be accomplished 
by having the committee: 

1. Review the specific competencies and then identify potential tasks or jobs. 

j 

2. Hypothesize regarding vjrhat might be appropriate jobs or tasks and then validate or 
change the jobs through) a process based on the analysis of the occupation (In practice, 
both approaches, indivi(|lually or combined, are used.) 

i 

The competencies related tq a specific occupation can also be arranged by level For 
example, skills usually identified with a skilled worker would be different, in certain respects, 
from those of an apprentice. | 

Therefore, the competencies could be categorized by job levels within occupation. 

Organizing the competencies b^ level will help the test developer design examinations more 

appropriate to a specific job or pobs within any occupation. Organizing by level will require the 
additional breakouts related to piajor divisions and subdivisions of competencies. These listings 
need to contain the actual understandings and skills required to function adequately at each 
level. 

In summary, once the maj^r divisions have been identified and the competencies within each 
level described, specificf understandings and skills within each subdivision can be ascertained. 
Such information forms a bast* for curriculum development 9s well as for the construction of 
performance tests. Occupationjs are broken down into specific job levels, and in turn, each level 
is arranged into specific competencies. The scope of each examination must be specific to the 
level desired. The jobs selected for inclusion in the test should be based on these levels as well 
and they should be representative of current practice in the occupation. (See Table 2.) 

Additional Considerations 1 

The jobs selected for inclusion on the performance test should evaluate different 
competencies. Each job shoulcl adequately measure specific aspects of the critical competencies^ 
required In the performance of the occupation. When a student undertakes to identify what may 
be causing a malfunction, the logic of the troubleshooting approach should be assessed. There 
must be a demonstration by the student of approved methods. 

"f . 

69 



ERIC 



7? 



Tabl« 2 

Major Developmental Steps Related to Constructing Performance Tests 
1 Identification of the field and level of jobs within each field 

2. Determination of competencies through occupational and task analysis 

3. Organization of competencies by level 

a. single skilled 

b. semiskilled 

c. skilled 

d. technical 

e. professional 

4. Categorization of competencies by job level 

5. Aialysls of competencies per job level to identify critical competencies 

6. Identification of jobs or tasks by which the critical competencies of individuals may be 
judged, including scopes of examinations, equipment and materials 

7. Identification of weighted criteria for each job or task along with preparation of rating 
scales and scoring procedures 

6. Standardization of testing procedures 

9. Pilot testing^ of the Instruments >^ 

10. Analysis of data 

1 1 . Revision of tests as needed 

12. Flatd testing of examinations 

13. Analysis of test and demographic data 

14. Preparation of norma, reliability measures 

15. Preparation of a technical manual 

16. Research reports and studies 

I 




TECHNICAL 
ISSUES 



17. Establishing support systems, facilities, staff, operations 

18 Undertakij^ steps for test revision 

19. New test development activities 

20. Special studies, stability, applications to other populations 
?1 Major revisions through repetition of the process 

22. New development through redesign ^ 

23. Comparative analysis of alternative forms 

24. Data collection and analysis 

25. Test revision 

26. Reporting and implementing new developments 




Th« c\ose,A>iM^ can dupllcnfo ronlity in n porformnnce t«st, th« b«ttflr the measure wil be. 
the «rt(.fll ope^tion of machines, apparatus, instruments and tools used on the )ob should t>o 
included The step by stop procedures involving designing, cutting, to.nwng. tu.iung. ^.haping. 
and assembling units Into components has to be demonstrated as well 

In situations where troubleshooting represents a major part of the occupation such as In the 
electronics field the step-by-step prpcedures for locating the malfunctions in equipment and 
instruments should be documented by the examinee. The student should also demonstrate his or 
her ability to remove and replace defective parts or components, as well as calibrating and 
maintaining instruments used in the occupation. To illustrate the approach, the machine fool 
trades will be discussed. ^ 

The machine trades occupations can be divided Into divisions such as layout, benchwork. 
machine tools, heat treatment and so forth Once these divisions have been made, it is necessary 
to identify the critical competencies required to perform tasks and jobs within each division 
successfully This analysis will reveal that there are similar types of skills required to operate a 
different piece of equipment. It is this recognition which will help the test developer sy nthesize 
tasks and gain economy In terms of the number and types of jobs required to demonstra e 
r^a8?e?y of a competency. Table 3 lists some of the skills within one division of the machine 
trades area. 



I. 

/ 



' 72 



80 



t 



TFCHNICAL 
ISSUFS 



Table 3 



Selecttd Skills Necessary Within a Divisional Area 



Mafor Division 



Subdivisions 



calipers (^jse and application) 



steel rules 



protractor 



radius gauge 
micrometer^ 
hole gauge 
vernier calipers 
height gauge 
dial indicators 

layout and inspection ^ •> 

measurement of surface finish « ^ 
blueprint reading 

sketching, and making of technical drawings 

use of layout fluid 

layout of work piece 

precision layout 

surface plate 

vernier height gauge 

comparator 



X 



73 



ERJC 




in the machine hades a.ea. tho critical compotoncior, inrh.dorl under Invoi.t and inspection. 
rouM inrtude ioh. that rm,.Mfe s.i.:h fu.ichofm as layout of wofk. .nclud.nu centers, retere.u.e 
contour and dimension mies, surface preparation using common hand and measurenrent loolt,. 
surface olate and other holding or clamping devices, precision tools and gauges, teshjig and 
^fnatS^c^iCwIth precision Inspection tools, precision blocks, gauges, indicators, hardness testers, 
and use of rcomparator. TheTefore. the performance )ob selected sjyuld sample the procedures 
that require a working skill using the measurement tools listed 

The specific tool or procedure selected would depend on the level of sophistication needed ^ 
to be judged competent in a given job situation The criteria for assessing skill proficiency 
Should Include both process and product measures and A rating scale used by an ovaluator to 
observe the subject. Performance of an Individual taking such an examination might include such 
criteria as: 

Process (These criteria provide standards related to how each candfdate undertakes to 
accomplish the job. the methods and techniques used ) i 

1 Handling of layout tools , 

2. Planning of layout procedure 

3, Layout process 

Producf (These criteria provide standards related to what each candidate accomplishes, the 
outcomes.) 

1. Accuracy - 

2. Precision 

3. Time 

Notev As the experience of candidates Increases, product measures provide rhore important 
Indicators of competency ^ 

■ When the ran|ilng of individuals is important, ratings of performance on each erfterlon should be 
noted w5Ien absolute mastery is essential . a check list may suffice. The first approach allows for norm 
ref^rejicing while the second can be criterion referenced. 

To the extent possible, various weights can be given to criteria. The weights should reflect 
the Importance and frequency of those criteria In relation to the competency be ng examined^ 
The mors Important aspects of the occupation should be weighted higher than less important j 
competencies. 

After the Initial design of a performance test has been prepared; the test should be reviewed 
by knowledgeable people In the field. This content validity step will Increase the probat>^^lty that 
fhe content of the examination and the criteria are appropriate and adequate. In essence^ this 
wamd be the second major validity check of the examination; the first being agreement among 
experts on the list of critical competencies.* 

^ 1 



74 



TFCHNICAl 



Standarduation 



Whon tho consijitaots tuwo ayrood on ttui tost, nil no(;i>ss<Hy inforaiation cUKt niatonalti to 
conduct thi^ examination should be prepared in a manner that will permiUidentical administriition 
of the test. Iteiitejo consider include: . ♦ 

1. A hand.book. providing directions for^the exami/ier as well as for the student 

2: A sol of jobs that will be required by oaoh candid^^te-including specific weighted criteria, 
and the amount of time usually required to complete each subanit of tho test« (A subunit 
represents a jOb Containing a aeries of specific job competencies) 



3? A scale stipulating specificcriteria. 



; Having assembled "^^xese materials, it is now desirable to pilot test the examination under a 
variety of con(i<tions. For examplB. the test might be given ton 

. " . * .. , 

•* People who are currently.^m ployed in the occupation ' ' 

• Students coropletHTg traijiing in the occupation ' ^ % 

c . « ^ . ■ ^ < 

. • Studerits startiVigf their training in the occupation " , - 

»,'» ^- 

^- - . ' . " ' ' - • . '■* 

The developer should observe if the items are indeed functioning properly. Are there real 

differences in the scores achieved by the different populations? Students who are begmning ih 

an area should do significantly less well thjarT th^se wtio have been in the job .for son^e trm^, If 

these conditions do not prevail, then modifications tO- the test instrument are requif-ed.^ . ^ . 

•The Individuals who will be used as eyaluators siTould be given an opportunity to take the 
performance examination themselves. This type pf hands-orj experience will point out To the 
evaludtors s^ome of the problems likely to be encountered by examinees. All of the.conditions 
required for the administration of the test should be the sa'me*f©r each rfdministratiorl of the ^est. 
Because plarformancp tests usually require the use of local equipment or tools, some variance in^ 
?cores cannot be*completely controlled. Their effe'cts can be reduced if candidate^s^re checked 
out on the equipment b^ore the test-or if they 6re permitted- to lifee their own .tools ^ 



Validity, Reliability, anc( Nohns • ^ °* 

lr> addition to the conteht validity and agreement of jjjd^es, the results obtained from 
perfocnrianc^.examinatiqnjP should be corhpared t^gtOther measures of student achievement, such 
as a iBtudenJ's grajje point average. A high correlation, in thi^ case, would provide a measure of 
th^ .test's criterion validity. Superv1s^)ry ratings achieved byjseople in the \;vbrkpla|ce;3buld also 
b^-qpmpared with the student's perforif»ance.,BaslcalLy,.>f these measOres Were taken at 
.approximately the same time, fhe^^ .woyld be'an igdi'cation of the te8tV''concurrent" validity. The 
integcatfon of several factory \p nroasure an jibstract goncetpt is callecj Jgonstruct" validity. 
Developmental efforts.regarding fhe Identification of^il'ch tiraitfe caV.be inccrpcJrated.in 
performance tests; if:desired. Validity 6f the test is the degree' to which the test measures.wh^t It 
was designed to measure. *(See Table 4.) ' : " , • 



3 



Tabfe 4 



Types of Validity and Their Application 



Type 

1 . Content 
zr^riterion-related 
"3. Cpncurrent ^ 



4. Construct 



Application 

1. Test of skill anfd 
tr'ainihg 



2, Prediction of future, 
based on 'current data 



3. Prediction at a* 
specific? time 



4. Measurement of a 
scientific i<toa or 
factor; fibstraqlions^ 



Test 

t Samples desired 
domain. Judges* 
consensus 

2. Correlation' between 
scores ar>d criterion, 
measured Over time. 

3. Correlation between 
Scpres and criterion. 
Tests of other "measures 
obtained at the same 
time. 

4. Explanation of * 
^ variance through 

experimental desjgn, , 



\ 



' Note: Performance teats are usually validated using content analysis. Other torms of validity also 
, can be .applied. .' 



TFCHNICAL 



When evaluator judgments are required, such as in the case*of most performance tests, a 
measure of mterrator reliability is dosiri<blo Administiativo costs associated with ol)tainiiK) suct^ 
monsuron «iro high ^ t 



14/e of 



A recent study at NOCTI demonstrated the efficacy- of the n/e of Cronbach's Alqha measure 
. of reliability for determining the internal consistency of performance te^ts There are other types 
of reliability that, likewise, apply.^For example, measure of stability can be obtaine<;J by using the 
test-retest approach, or measures of equivalency can be obtained comparing alternate forms of 
the test by means of correlation coeffiOiont. Roliabiirty is a ratio of trio truo variance ir) li set of 
scores compared to the total variance obtaKiod. |ruo variance is what,rehiairis after all of tho 
factors which. may contribute to error are explained or controlled Errors may result froiif) iTiany 
sources auch as; ambiguity of materials, test administration, inconsistencies, examiner biases, 
and subject anxiety about test. taking to name a few. A summdry of some of thd approaches 
related to obtaining measures of reliability for perforrrian'ce tests is presented in Table 5 * 

A performance test that is both valid and reliable requires an application to a. sample of the 
population it purparts to measure in order to estabigh norms The field test can provide the data 
from which standard scores may be derived. The scores can be reported as percentiles or in 
some other appropriate form such as a "T" score, where the mean equals 50 and the standard 
deviation equals 10: ki addition to overall performance test score norms, it may make sense to 
develop subscores for diagnostic reasons. Such measures ^can provixle counselors and educators 
with a more precise indicator^of a student's accomplishment as well a? a measure of unmet need 
withiri a specific area of understanding. 

The method of providing standards, which has been described, is called norm referencing. 
The standard scores are based on the distribution of scores of a sample for a specifijc population. 
The norms provided by test publishers usually .pertain to a large area, when feasible, local norms 
6/iould also be prepared.^ • ^ . ' ■ 

Performance tests also can be scaled using other approaches. One such approach is . 
critlffrioo-referenced norming. In this case a specific level of mastery is required for success. The 
mosVwic^ely recognized exarninations that*use this concept for notming are the tests> given to 
people who want licenses to drive a rhotor vehicle. Becaljse performance tests by definitibri must 
be content valid, they can be scaled in a criterion-referenced mode as well. Thi? is because 
criteripn-referenced tests also have to be content valid. 

The Rasch meth6d of scaling may also have opened ott)er,w1ays for perfgrmance test 
^developei^s to scale tests. This staling method is based on factors independent of. popiWation 
considerations. In essence, the technique provides data related to the percent of students on a 
specific level of deyelopmehi who would be expected to respond correctly to the task^. By 
testing students at different levefs of achievement, one might arrive at a task characteristic curve 
which then could be used as the stahdard.**^ (Conceivably, a single p^Vfornlence test could be 
scaled, applying the three methods in one insfrument. The users would then select the norm that ^ 
best fits th^ purpose Tor whicfi the examinatiorkis being applied. The norm-referenced approach^ 
has become the acceptable standard for most tests. With.time.' especially as refinements occur 
regarding related ^est theory in ter'ms of the crlteri^jj-referenced and Rasch models, the use of 
these hewer techniques should find wider acceptance. Therefore, the^r application should 
<)ecome more gommon, especially in the area of performance testing. /^^^ 

In instances* where the same performance tes^ may be applied to different populations, it is 
sSpptopriate fo provide norms for each of the groups. Under ideal conditions, the developmental 



Tabto 5 



Some Approaches Towards Obtalnfng Measurements of Reliability for Performance Tests 



Type 

1 Test-Retest 



2. Alternative forms 



3. Single form 




pplf cation 

, i 

1 Second administration 
' df Identical test wherfe 
setting may have an 
effect. 



2. When there is a needM 
. for more than one toet 

. to measure the same 
performance. 

i 

3. When measures are toj 
be obtained from one 

' test. 



Test 

1 Coefficients of 
stability. 



2. Coefficieats of 
equivalencet 



3. Coeffic^^^t• of 
internal consistency. 



I 



TFrHNICAl 



process for test? should inclOde an analysis, describing the effects of tho test nmonfl poopio of 
(lifforent sox, race, agt^, trainiru). and oxpnr inncrt Dunnq tti(f initial stagns ni tost ch^volopnuMit. 
:>uj.,fi iU)aIvo(io nu\y not bo po:»:»it)l(» Mt.n\ rviM . dalii of llii:. natiiir :JiouUl ()l)liif(UHl as tiir 
application5> al tho tost boconu^ inoro divorso In situations wtuM(> lUjual pniploymrnt oppoMunity 
laws apply, the data may be required ' F.urthermore. good practice roqgiros that when subScored 
norms are provided, the information regarding the validity and reliability of the resources should 
also be included. , 

A caution needs to be raised regarding gonorali/ations frorii too fow cjittuia in a sul)snt On 
performance tests, work at NOCll has rovoalod that. at least four procoss and four product 
brttertft are needed to obtatn an acceptable level of internal consistency, a reltabtlity <)f^'90 or 
better. Generally, performance test^makers have provided too few criteria or too many. Tqo few 
criteria may result in an inadequate measure of the test taker's true score When too many 
criteria exist.^the scales become difficult to administer which may result in increasing the rater's 
bias 



Cut Off Scores ^ . ^ 

There are no universally applicable methods for determining a cut off criterion. Frankly, it 
depends on many factors. It may be b6se3>a(i a probabilistic model. The cut off might be rplated 
to supply^nd demand for a given occupationXin situations where there is a large demand arid a 
small supply, a more liberal criterion might be used; and the reverse might be considered under 
appropriate conditions. If a high dcvgree of skill is required to demonstrate competency, then the 
cut off should reflect that level regardless of market conditions, ^ ^ 

What is important in establishing a cut.off is the rationale for determining arid considering 
when a point must be clearly understood that it may be defended if necessary Once such a cut* 
off point ha^ been established, the results of examinations should be monitored This will 
ascertain whether or not the measures are providing weights for me&ningful decision-making It 
is only through constant reappraisal that appropriate cut off scores can be maintained. 

/ 

Another concept to remember is that a tQSt*s cut off score must be fair; fair to the candidates 
taking the examination and to the people they will serve in the occupation. The measures 
obtained from a performance test represent an estimate of an individual's performance under a 
given set of conditions. They cannot represent all of tbe characteristics requiVed to perform a 
given task adequately. However, if the performance test ha^ been constructed using common 
practices, thei*^ will be a high probability that scores achieved on the excunination will reflect the 
indivi,duars ability to perform successfully on the job and in Ijie dqmain examined. 



Test-Related Materials 

^n*addition to standardizing a test and obtaining measures of reliability and validity^ it is 
important to provMe data about the test to the userSTThis information ma^help the user make 
decisions about the appropriateness arid adequacy of the examination as well as providing 
directions for test administration, scoring, and interpreting the results; A manual should be J; 
designed to convey pertinent infornMtiort to usefs of the test. ^ , 

* *■ . . 

, Reference to studies that involved' the use of the test should be' inoludeil, such as studies 
concerned with mea^res of validity and reliability under varying conditions. Any claims made by 



f — 



79 




ft 



the pMblljih<i»r phoiit tho \m\ fthouirt b« substantiated in the information contained in the manual 
or by reforonco Iq trio manuni to ft scoio that woiilrt support tfw c.lftim In addition. informHtion 
legaiding t>ow llu» («sl vwah d^^v«I^J^)^ul should aluu bo iiK.iiuiiul " 

Since testing is a dynamic activity, these manuals should be revised and updated as research 
and conditions warrant. Information regarding how to Interpret the exminations should be 
Included along with warnings to the user regarding possible misuses of the Informntlon obtained 
through examinations. If the performance test norms were developed for use with a specific 
population, they may not be applicable to other populations Information regarding spocilic . 
applications and purposes of the examination should be explicitly statt^l The manual sfiould 
identify the quallficationa^ieoded In order to adiDlnlster the performance test The qualifications 
of the evaluatora are as important as the test itself when it pertains to occupational competency 
a.ssessment. 

Directions for administration and scoring a performance test should be clear so that the 
examination can be similarly conducted in all settings One probleiif in preparing examinations Is 
that the laboratories or shops where tests are conducted are different Under strict 
standardization process, it would be generally lipid that candidates taking the examination 
should be required to perform the test on the same piece of equipment. Although manufacturers 
tend to produce machines of comparable design, tests, out of necessity, will be conducted using 
different makes of the same tool. Therefore, in the instructions to the evalu^r. a notice should 
be given that equipment having similar specification|>to the suggested stah'daYd may be 
substituted. Skilled workers, with a minimum amount of instruction can function effectively on 
equipment manufaptured by different companies. In situations where candidates may be 
unfamiliar with a specific piece of equipment, they should be given an opportunity. priOr to the 
. examination, to become familiar with the controls of the equipment, THiey should also be 
permitted to operate the equipment for a brief period of time before the start of the test. 

> . 

" When the examiners are required to ^core their own ratings, there should be procedures 
presented in the manuaf with enough detail to minimize the probability of scoring error. In 
situatiohs where the scoring is to be accomplished by a test publisher, it is recommended that 
the evaluation rating sheet contain, in addition to the numerical assessment, some space for 
general statements or comments regarding the overall performance of each candidate by the 
evaluator. This information can be useful as an internal control. A candidate's total numerlQpl 
score should be in agreement with general statements made by the evaluator. For exanriple, if in 
the evaluator's numerical rating, the rating turns out to be extremely high, his general comments 
should be consistent with. this measure. If this were not the case, a follow-up should be initiated 
to correct this apparent disqrepancy. 

Standardized measu^e8 of central tendency, standard deviations,. 8tar>dard errors, medirfn, 
and validity and reliability and correlation coefficients with their standard errors 0| measurement 
should be contained in the* manual with the fundamental data. Demographic data and sample 
size from which the data was derived should also be reported in the manual. All of the data 
reported in the manuaF should- help the pWentlal yser determine the suitability of the test in ^ 
terms. of the particular application as well as to assist in the interpretation of scores. 

Since "skiir Is a relative term. each.of the criteria selected could be judged on a rating scale 
containing at least three levels such as extremely competent average and ipepl Alonjg with the 
baiic informationj^ntained in the panuall it should be stated th^t local norms may vary from 
the norms that ^i^ublished in the manual. When populations are large enough, it may be 
desirable to have local norms. , . 



TFCHNICAl 

r 

In addition to norms for individuals, class or program norms can be derived The use of 
performance tests within a school setOng r>iay roquiro the derivation of special moasuros us(>(1 for 
tho purpoiiOii of analyzing group autu.ovotnonl a:., a njiJauuiu ul luauhoi u11uuli\<uuui.b. 
Poffornuuico tests rtiay bo used to a^jsoss tho pur forrT)ar)co of a group or program. Wuiy roquirub 
using the mdan source of each class of students in a given program, instead of individual student 
scores.' When sucb^norms are provided, the user should be informed of their derivation and . - 
application. Whatever apprc^ach is used, it is important to provide the user with the standard 
errors so that the precision of the measurement may be understood ' 



Test Revtston 

Once the teq^t has been used, the test developer must continue, on a periodic basis, to 
update and improve the instrument. Although generally the critical competencies within any 
occupation do not change radically from year to year, important develofxowits do appear that 
must be considered The magnitude of these changes varies among occupahons. For example, in 
the printing industry during the last twenty years, there have been tremendous changes in 
technology. The same holds true in the field of electronics However, changes in fields such as 
carpentry or masonry tend to occur at a substantially slower rate of development. Therefore, the 
rat^ of chapge 6n any performance test measuring competencies in these occupations would not 
var/ greatly from year to jjoar. ^ 

•> 

What is the most appropriate time to change jobs on performance tests? Rather than be 
completely random regarding when to change some items. NOCTL for example, deletes a 
competency when it is not being used in at least 25 percent of the field and adds new 
competencies after the practice has been adopted by at least 25 percent of the field. The 25 
percent is an operational standard that can be modified up or down depending on 
circumstance^When a replacement job has beep selected, if the time required to Accomplish the 
task and its tora*l|mlue on the test is sifViilar tp the item being removed, the change may not 
seriously affect th.e cumulative norms. However, any change within an instrument ^pust be 
examined to see whether or no! the charrge could cause a change in the nprmsjbnd thereby 
invalidate the standard. New norms are usualtv needed when jobs are^0l^^tfm\n cases where 
examinations have high reliability, srtiall chang^yon the test dO not a^fpear to alter outconfes. It 
is therefore possible to update djiiaminations and use the cumulative norms without necessarily 
being too concerned about problems of independenC|^HH6wever. if this practice exists, it is best 
to monitor test results to»make certain that significant ditferences do not occur, since vyhat may 
appear to be a small change coyld affect results in significant ways. 

^Performance test^ may not cpver the latest developments or all of the techniques employe|j^ 
by individuals ertftaged in a specific occupation. However, generalizations about a person's skill * ' 
can still be valid. Just becajjse someone has knowledge does not necessarily Indicate 
competency/ For example, a student may know all of the latest techniques, and yet some^ 
these technicjues ma/ still have to gain acceptance in the field. The reverse may also be trueTtha^ y. 
field may be weJt ahead of the training institutions. Therefore, the performance measurement 
does nof reflect (Competency unless the exanftination is based on the current practice in the fielc* 
The testing shouW relate as diirectly as possible to reality .J"his direct parallel with the world of 
work p>ovld^s speoifrc i?jiformation regarding student pcoomplishments in terms of-the needs of 
em'plcfyers, \ * . • ; 'V 

. Since the tasks aVe based on reality, performance tests, can be usecl%ievaluate the quality of 
programs as well as tQ^poihit put areas in need of improvement. 



Thhm will h« «ituntlr,nf» thflt pr«nlurie usinq a direct experience For e^ampro. in the arise of flying 
an airplane, a simulator may he thn best way to tost initially tor tho skill lovol ratl.or tlu.n tc, pormK tho 
student to directly corHfol th« lliuht ol a pi»»oi) 

When feasible, alternative performance tests should he provided Several versions of the 
same Instrument Is a help In regard to test security Although it may be highly desirable to have 
other exams, the cost Involved In such development Is high, and the difficulty of arriving at 
equivalent forms sometimes precludes their application 

•» 

Performance tests appear to provide measures of achiovernont that are not biusoii duo to 
race or sex Because of the importance test scores have on the future of an Individual and 
sc^ew concern Is often raised about test bias due to race or sex^NOCTl has found that scores 
derived from performance tests tend not to contain these forms oferror variance Test results 
should be communicated clearly This suggests the describing of the confldenT^e Interval around 
a test score." rather than just reporting the point estimate of the measure A report of scores 
should be accomparvied with all the necessarySnformatiori required to interpret the measure 

A Few. Concluding Comments 

Vocational instructors have always used performance tests. The basic difference betweeh 
their approach and the one described in this chapter is that the test development P^ogedures 
followed by instructors normally result in larger error terms Standardized tests are more likely to 
reduce the size ofnhe error in the student s score «" Therefore, they provide a^mer estimate of 
student's true achleverfient level along with obtaining comparable measures qcross programs. 

The performance test samples an Individual's ablllty.,io perform jobs and tasks that are 
judged to be critical and Important within a given occupation: They may *«ke the form of rea^^^^ 
work or a simulation of work. Regardless of what form they may take; they shpuld be as redl stlc 
as possible. Performance tests provide a way to assess psychomotor skills as well provide 
for an alternative way of examining a person's problem-solving f^'I'tV When coupled with om 
measures of achievement, they provide valuable Insights regardlng^n individuals ability and a 
program's effectiveness. 

Performance tests are simply another method of measuring skill attainment, These tests, 
themselves, must meet general standards for educational and psychological testing.'" In the past 
this has not been effectively accomplished. With advances In test construction and mea9urement 
theory It Is now feasible to create effective and efflclenUaerformance Instruments. Because 
techniques ar* now available to standardize such tests. Tfteir usefulness will corifinue to be 
appreciated, and their application .v^lll continue to expand. The afte of standardized performance 
tests has only just begun, and Its future looks promising. 



TrCHNICAl 
ISSUES 



\ 

'» 

Notes ' 



'Panitz, A. and Olivo C. T.. Handbook for Developing and Administering Occupational 
Competency Tests New Brunswick, NJ Rutgers University, 1971 

'Lennon, R T "Assumptions Underlying the Use of Content Validity " Educational and 
* Psychological Measurement (1956): 294-304. 

•^Anastasi, A. Psychological Testing. 4d, New York: MacMillian, 1976 

^Cronbach. L J., and Gleser, G. C. Psychological Tests and Personnel Decisions. 2d Urbana, II , 
University of Illinois Press, J965. ^ 

*Klejn. R. S. and Pfeiffer, S. "Measuring the Internal Consistency of Selected NOCTI Performance 
Examinations." Paper. Albany. N.Y.: National Occupational Competency Testing Institute. 1979. 

•Forster, F. 'The Rasch Item Characteristic Curve and Actual Item Performance." Paper, San 
Francisco: American Educational Research Association, April 1976. 

'United States Equal Employment Commission, et. al.. **Uniform Guidelines on Employment 
Selection Procedures. " Federal Register, (1978): 138295-38309. 

.''Davis. F. et. al. Standards for EdVcational Psychological Tests, Washington. DC: American 
Psychological Association. 1974. 

•Thorndike. R. L., Educational Measurement. 2d, Washington, D.C.: American Council on 
Education. 1971. 

'"Tunkel, L. S. "Occup^itional Competency for Quality Vocational Education". Monograph, 
^ Columbus, OH.: State Education Department. 1979. . 



> 



TFCHNICAl ISSUFS: 
COMMENTS 



Comnwntt on the Technical Ittuet 
In Performance Testing 

Saniuel A Livingston 
Center for Occupational and Prblossional Assessment 
Educational To*iting Service 
Princeton, New Jersey 



These two papers raise a number of technical issues in the development and use of 
performance tests: 

• How should the test maker select tasks for a performance test? 

• Should a performance test evaluate the student's procedure or only the product of the task? 

• How can the student's performance be translated into a test score? 

• How can we reduce the influence of irrelevant factors on the student's score? 

• What types of reliability are particularly important in performance testing? 

• What types of validity are particularly important in performance testing? 

• How should we set the pass/fail cutoff on a performance test? 

Klein suggests that the main consideration in selecting tasks for a performance test is that . 
the tasks should adequately sample all the skills that are to be tested. This suggestion is good as 
far as it goes; redundancy in testing is often a luxury that performance testers cannot afford. Bik 
what should the test maker do if there is not enough teeting time to test all the 'skills? I would 
suggest that there are two considerations: (1) the consequences of allowing someone to remain 
deficient in a particular skill, and (2) the extent to which the skill can be tested by other, less 
time-consuming and less costly methods. 

Klein suggests that both process and product should be eval^iated in a performance test. 
This advice is usually sound. Concentrating entirely on the |!>roduct and ignoring the process can 
be dangerous, especially when safety precautions are involved. Process evaluation is also 
important when a bad procedure yields a bad product only part of the time. But if no safety 
precautions are invd^/ed and if wrong procedures always show lip in the product, an evaluation 
of the product may be Sufficient. In other performance tests, it may make sense to base the 
evaluation entirely on the process. The product of the. task may be dWficult or impossible to 
observe. The quality of the product may depend heavily on factors that cannot be standardized. 
Or the product may be a joint effort of two or more persons, only on^e of whom is being tested. 
In these cases, an evaluation based entirely on the process is quite appropriate. But in many 
performance' tests, it makes sense to evaluate both process and prodgct.* 



85 



9s 



UVINQSTON 



Doth papor-? doni with tho prohlom of ronvertinq pertormnnre into n test score Perloff 
siigqosts several v^oaknessos of intinn sc alos tnjt doos not offoi any altomativo KUmo soqgosts 
that porformnnco testers use rntiruj scales for rank.iuj hduhiiils. u.suki ^-tuxK liolo only when 
absolute mastery is essential.' Actually, a highly detailed chock list may pjovido oriough 
information for ranking students, as well as for determining their mastery in an absolute sense. 
Also, a check list requires less judgment on the part of the observers and thus reduces the extent 
to which the student's score depends on tho observer's individual standards (and the observers 
mood at the time Ihe of the test) The completed check lists also provide a detailed, descriptive 
record of students' performance, for dinqnosinn student s (and instructors ) weaknesses, and toi 
documentation in case of a disputed score 

Both papers offer several specific suggestipns for reducing ther influence of irreJeygnt factors. 
In brief; 

• Standardize the testing conditions 

• Give the observers detailed instructions 




• Train the observers 

• Use more than one observer if possible 

• Have the observers record their observations as soon as poss^e after 
making them. 

One additional technique that is often helpful is to give the observers examples of adequate 
and inadequate performance for each aspect of the task requiring the observer to make a 
judgment These examples should illustrate borderline cases if possible. That is. the example of 
inadequate performance should be nearly adequate, while the example of adequate performance 
should be just barely adequate. Examples of this type provide a clear standard for the observers 
to use in judging the students' performance. 

Rellab>»ty is the level of agreement between test scores that would be the same if the scores 
were free of the influerfce 6f irrelevant factors. In performance. testing, the most important of 
these irrelevant factors is usually the selection of a particular observer. Therefore, the most 
important type of reliabilty is inter-observer reliability. To determine the inter-observer reliability 
of a performance test, you must try it out with at least two observers observing the same • 
performance. If the test involves an evaluation of the student's procedure, both observers wiJI 
have to observe the student at the same time (unless the performance is recorded .in some way. 
e g video-tape) Other types of reliability may also be worth investigating, e.g., short-term 
stability, or alternate-forms reliability (where the alternate forms contain different tasks selected 

to test the same skills). 1j 

* 

Internal-consistency reliability statistics such as '■KR-20" or "alpha •• are often irrelevant to 
performance tests. They should not be applied to the checkpoints on a check list, because the 
theckpoihlS are not a sample from a much larger universe of possible checkpoints. They are not 
Interpreted in terms of some underlying trait. They represent opiy what are ttiey are-the most 
important observable aspects of the task. However, there is one case in which internal - 
consistency reliability statistics would be relevant to a performance test. This is the case of a 
perforiapnce test that contains several tasks, all intended to measure the sam^general abilities. 
In this ciise. the "items" would be the tasks, not the individual checkpoints. - 




TrcHNiCAi issurs 

COMMENT$ 



Vahditv is the extent to which a tost (loos the job it is heinq used for More than any other 
kind ot tost, a perforrnnnco tost is t\ diroct moasufo of ttio skills if is intprnlod to tost TtiOfolcMO. 
IhtJ ii^ust luiovanl typu u! validily k> cuntuiit valiJily Lvun ttu) l)<ii i^luust uiiIk,.s o( i^unttJul valuljly 
cuncodo tl)at it is (olovant wl)on tho infoMimtion losulliny truni Iho lost is oxprossod m "the btnct 
i behavioral language of iask performance."' However, if we intend to draw inferences about tho 
student's performance In situations unlike those included on the test, criterion-related validity 
may also be relevant (The concept q1 "consistency validity" introduced by Perloff does not' seem 
very different from content validity.) 

Perloff rs correct in asserting that validity is the most irnportarU cfiaractof istic of ar^y tost A 
tost that does not yield valid SQpros is worthless as a rnoasuring device. However, there is often a 
trade-off between validity and efficiency. It may be necqssary to sacrifice some degree of validity 
to achieve a gain in efficiency, which is what we do whenever we use any form of simulation in a 
performahce test. Often the most difficult decisions in developing a performance test involve the 
trade-off between validity and efficiency. The real world forces us to do our testing wi^ limited 
resources (time, money, and so forth) and without risking the safety of the students or cHiier 
persons. VaJidity is the mam thing, but it is not the only thing 

More than any other type of testing, performance testing offers an opportunity to choose 
cutoff scores in a way that most experts would acknowledge as correct, or even "optimarv Any 
method of choosing a cutoff score involves judgment What is important is that the judgments" , 
must be made in a way that assures theif rneaningfulness and that they must be made by 
persons who are .qualified to make them. Probably the most meaningful type of judgment is the 
direct judgment of examples of performance as acceptable or unacceptable In most other kinds 
of testing, it is difficult to get meaningful overall judgments of students' performance; \n 
performance testing it is easy. Judges' standards will vary, but these differences will tehd to 
"average out" if several different judges participate \n the process. By analyzing the students's 
test scores together with jhe judgment of their pertormancjs. we can estimate .the probability that 
a student with a giv^n test score*would be judged*(by a randomly selected judge) to have 
performed acceptably. 

To use these probability estimates to set a cutoff score, we (i.e*. somebody) must make one 
other type of judgment. There are two types of decision errors we can make. We can pass a 
student who deserv.es to fail, and We can fail an student who deserves to pass. What is the 
relative seriousness of these two types of errors? We will never be able to eliminate decision 
errors completely, as long as there is any test s^ore at which some studer^ts-are judged 
acceptable and others unacceptable. Tfie best w^e carl hope to do is to minimize the total harm 
from the errors we will make. When we know the probability of each type of error at any given 
test score level/and the relative seriousness of the two types of errors, we^can choose a cutoff 
, score that is "optimal * in this feense. 



LiyiNQSTON 



J 

'S.J. Messick. "The Stan(*ard Problem: Meaning and Values in Measurement «hd Evaluation. 

American Psychologist 30(1975)- p 955-66. 



\ 



CHAPTER FOUR 



/ 



LEGAL ISSUES 



X 



/ 



^ * Qu05tlon$ frequenUy ^risB In th^ield of^(hicatl6n4hat often find th&mselveg to be part 0f brpad \ 
legal issues affecting the delivery o^a// (ypes of eduaational services. The institutionalization of 
perforrmance testing In vocatlorial education programs,' for example, brings with it a series of 
^ legal concerns to which teaQhers.and administrators must 60 sensitive. Paul U Tractenberg's 

paper opens Chapter Four with an overview of the l^gul implications of performance testing. We 
begins by Identifying the major legal provisions— due process, equal protection clauses, state 
V education clauses, ffideral and state education statutes, federal and state regulations— whioh may 

prove relevant to perforf0bnce testing. Tractenberg then applies the legal theories t0 a series of 1 
keynotes on minimum competency testing Jhat have' been adapted to performance testing in<9f 
vocational education, ' ^ ^» 



ERiC 



96 



The secood papef if) this chaptei. by Diana C. Pulht), idootffios lossofis (o bo hmnod ffoni tho 
minimum competency -testing movement f^he dtscusHes the qtiestiofi of HcronfitHbihty thnniqh 
performance testing from a legal perspective and then foct'jses on several legal areas which 
should be of concern to vocational education. The question of fundamental fairness is raised, as 
is the fundamental flaw in one minimum competency testing program, and some recommenda- 
tions for furnittmental fairness in vocatioal education performance testing programs are 
I' identified. The paper then discusses the potential for unlawful discrimination and the right to 

privacy. Finally, recomrhendations are offerod to the reader 

William G. Buss provides another look at the legal issues from a third perspective in the 
Comments paper. He emphasizes "some of the legal ambiguity and related interaction between 
law and education that is involved in the material consfdered in these papers. " 





1* 



LEGAL ISSUES 



L%gm\ Impllcationt of Performance Tttting 
In Vocational Education: An Ovarvlaw 

Paul L. Tractenberg 
Rutgers School of Law 
Newark. New Jersey ^ 

During the pa8tiav•ra^year8, the minimum competency testing movement has swept across 
the country. It has (eft Controversy ^n its wake. Proponents laud its potential as a vehicle for 
increased educatioiSal accountability;^ critics attack its basic premises and the feasibility of 
implementing it effecttvdfy.^ Some observers t}elieve that the movement has already peaked, and 
that the educational reform pendulum will begin to swing in the opposite direction.' For the 
moment, though, some form of minimum cor/ipetency testing program is in effect in about ^ 
three-fourths of the states/ The ifnplicatlons of these programs for school systems, for education 
professionals^ aHd for students^^re potentially great. 

One arena In which those implications are being explored is the courts. Students and 
parents have sought judicial intervention to prev^ent injuries that they allege will result from 
particular minimum competency testing programs. The first irtiportant decision— the Florida 
iederal district court's det^ision Oebra P. v. Turlington^— tskn been handed down. Several other 
significant. cases are pending^ and more are certain ;to be Tiled. The Impact of these cases on the 
present and future status of minimum competently ||ating Is iikely to be substantial. 

Judicial involvement in matters of pupil assessment is n6t new.' To a considerable degree, 
the courts have sought to defer to the educational authorities where the Issues raised by the 
cases were whether the assessment Instruments were apprgfpriately developecj ot administered,* 
. or their resultis were appropriately used.* But in some cases, tha conatltutlonal rights of students 
were so clearly and substantially implicated, or the actions /of the educational authorities were so 
deficient; that the courts saw no alternative but to interven^.^^ 

It \h against this backdrop that the use of performance testing in vocational education must 
be considered. Performance testing in vocational education has significant parallels to minimum 
competency testing in general education. Indeed, the momentum generated by the latter 
undoubtedly has contributed to Increased Interest In the former; peaking of the minimum 
competency tasting movement, or adverse coprt decisions, therefore, would have Implications for 
performance testing. But performance testing in Vocational education has a history and f^levance 
which are Independent of the minimum competency testing movement. 

•> ■ 

This paper has three purposes, each the subject of a separate section: (1) to provide a brief 
ovarvlaw of legal principles and proylslons that are likely to be relevant to performance testing In 
vocational education; (2) to describe the major'policy decisions Involved In developing a 
performance testing program and to asse9|; the legal Impllcatlohs of each; and (3) to predict 
legal developments ahd consequent pollcf^dirsctlons. The work of Brickell In articulating the 
seven keynotes of competency testlrig\) and of Ahmann in applying them to performance testing 
' in vocational educatlon^^ provide a convenlenji organizing framework. 



TRACreNBMQ 



An Overview of R»l»vant Legal Provisions . . . 

Thoro nrP sovoh rntofjorios of leqfll provisions thnt may provo relevant to porformanco 
testing devolopments Four artf constitutional In origin fodorai and state due process clauses, 
federa^l and state equal protection clauses; fpde«l and state clauses protecting P^^^^V • 
freedom of belief; and state educi^tloh 6lausee. the statutory^tt^ose P;«^'»'°"» ° ,^«f 7' 

and state statutory law that directly orlndlreotly bear lipon the establishment «"d.,operatior^ of 
performance! testing programs In vocational education. The sixth Is P^"^^^^^^ 
iules and regulations of the federal ancf state education authorities The seventh Is the common 
law."'legal principles evolved through the litigation process Each of those sources of law will 
be considered briefly^ 

• 1 . Fadaral and state due proq09s ' clauses. The Fourteenth Amendment to the Federal 
Constitution and most state constitution? contafn a dile process clause. The federa clause . 
provides that no sttite- shall "deprive any person of life, liberty or property, without due process 
of law." The judldary has construed due process to have substantive and procedural aspects. 

Substiintlve due process still In existence although significantly diminished In legal 
Importance.'* require* that ihe action of the state be rational and reasonably related to a ^ 
legitimate state objective. If. for exampfe. It could be proven that performance te9t'"fl.^«» 
evaluating students on materlals^or skills never taught In the vocational prograrn. 't^^ents who 
failed on that test to demonstrate their proficiency might credibly assert a violation of their right 
to substantive due process '* . 

' ProcedMral due process requires that the state act in a fair manner when it deprives a c ItlzelTi 
of liberty or property. In connection with a performance testing program. P:?^^®^"'^"' P??*" 
mloht redu re for Example, a procedure under which students .with "falling" scores be .pe m tted 
ffifngeVe scoH^^^ the test, the qualifications of the test administrators, or the validity of 
the test ItMlf. It might also require adequate phase In time for a perforrtjance toting P;ooram 
that Imposed substantial sanctions. The absence of adequate phase "'"^^J^" ^^^^^^ 
for the Detora, P. Court's four-year; deferral of the diploma sanctions under the Florida minimum 
competency testlrtg program. 

* ' ' ' / * ' ' 

Both substfl^ntlve and procedural due process require a showing ^hat h-pei'son has been 
deprived of liberty or property by action.of the state. Students could assert that denial of a 
diploma, or o? promotion or graduation, or of full acce« to the Job market. ■ 
performance/testing constitutes a deprivation of "property." Courts have found that students 
have a proijrty interest in their education such that physical exclusion from school, even for a 
ahort time /Vequirea due proceaa procedures.'* In the Debra P. ca>e. the court found that 
atgdentt Would ba deprived of a property interest by a minimum competency testing protfram 
that detejmolned whether they would be graduated. 

ThJoebra P. court also found that the minimum competency testing program deprived 
"falling^' students of a libarty interest- by stigmatizing them as Incompetent or ineligible for 
promotion, graduation, or a regular diploma." 

/should be rtimembered. though, that proof ofldeprivation of a liberty or property Interest In 
/ Itseh^ does not condemn tba state's action; It obligates the state tb act fairly and,ratlona ly. , 
Indiid during the past sev<i.^l years there has been something of a trend In the »e<^«''«' ^°"^s to 
expand governmenS^ prerogatives and discretion, and to affOrd corresporidlngly reduced judicial 



/ • 



99 



LtGAL ISSUES 



protection to agarieved citizens.^'' At least some state courts have resisted this trend in 

interpreting theiustato constitutional duo process clauses^'' 

2. Federal and state equal protection clauses Fqual protoction is n constitiilionni principio 
related to due process. Both require governmental rationality and fairness in treatment of 
citi)tens. The federal equal protection cttiuse also derives from the Fourteenth Amendment. It 

-prohibits the state from denying '*to any person within its jurisdiction the equal protection of the 

The equal protection clause tends to focus on state action y^ith respect to groups rather than 
individuals. A challenger of state action must show that it classifies persons and prqvides 
differential treatment to them without adequate justification. In the federal courts, as well as in 
some state courts, the burden of justification required of ttne state for differential treatment 
inbretMS with the importance of ^tie interest subjected to such treatment. The courts speak of 
"fundamental" interests imposing upon the state the burden of showing a compelling reason, for 
and no available alternative to. the differential treatment. This "strict scrutiny*' approach is also 
invoked' by ''suspect'' classifications, such as those based upon race. Interests of lesser 
importance or classifications not based on a suspect characteristic result in a lesser burden on 
the fitate^-perhaps only the need to prove that the classification is rational, even if not the best 
means to achieve the state's objective. In recent years, an intermediate approach has t>een 4^ 
developed to deal with certain kinds of cases, and a "sliding scale'* approach, in which the 
importance of the citizens' interest is balanced against the significance of th^ state's justification. 

has been advocated. 

* 

An equal prot^tion challenge to performance testing in vocational education likely would 
proceed along one or both of the following lines: (1) that, to, the extent black or Hispanic 
studjsnts were di)sproportionately represented among those failing to demonstrate proficiency, 
the program classified students racially or ethnically—a suspect classific^tion—ancf should be 
subjected to strict scrutiny; or (2) that the program lacked even a rational basis because, for 
example, the test was invalid^^ or covered material or skills not taught \n the school$.^^ Th^ 
argument that strict scrutiny should be applied because of the fundamental nature of education 
is unlikely to succeed in the federal courts. The United States Supreme Court ruled to the 
contrary in 1V73.^ Several state courts have reached a contrary conclusion, however, under 
state equal protection clauses." - 4 

Recent US. Supreme Court decisions also have created problems for an equal prot64tion 
challenge bas^d upon racial or ethnic discrimination. The Court has ruled that a statistically 
disptoportionate effect, while relevant, is insufficient t6 demonstrate a racial or ethnic 
classification.^^ Challengers of state action must, prove, by direct or circumstantial evidence, that 
theri» was an int«ntk>n-to create such a classification. That may l:^e a formidable task in the 
context of a perfdrmance testing program. On the other hand, if the particular'atate or school 
system previously has engaged in unlawful discrimirMtion, it may have an ongoing duty to 
eliminate the effects of that prior diScrimirfatiOn. In that situation, even a neutral classifying 
device could be found deficient." 

3. F§dBr»l and state freedom of belief and privacy provision^. The scope and content of 
some performance teMs may raise significant issues under the Firet Amendment's right to 
freeqom of expression and bell^f^anjcl the Fourteenth Amendments implicit right to privacy, and \ 
their jstate constitutional counterparts. These problems will arise primarily fron[) the inclusion in 
perfojrmance teats of items that assume or inquire into values, attitudes, or characteristics 
considered relevant to job succeHs, such as punctuality, respect for authority, and ability to get 



rHACT£NB£RQ 



alono with co-workers. There ate two areas of concern: (1) In order to demonstrate proficiencv. • 
the student m effect must subsctibu to certain vaUiosijind (?) thn studont r^pay bo rnqnlred to- 

rovonl ronfidentini porsohnl mattefs * i 

i 

There are no judicial decisions which provide defrnitive guidance about how tliese issu«s i^ill 
be resolved In a challeng^ to a pertormance testing proftram. An Important line of Supreme 
Court decisions^ does afford students with some protection against the efforts of ^hooi 
authorities to+iave them believe in a certain way or to express certain beliefs In performance 
testing however, student values, attitudes or charflctorlstlcs may be highly relevant to predicting 
success on the job To the extent 1hat such a prediction is an important (Element of performance 
testing, eliminating it might limit the validity of the testing Thus a court will have to balance th& 
respective Interests carefully. 

Another significant issue relaies to the confidentiality with which performance test result! 
are treated If the results are kept confidential, the Intrusion into a student's privacy is minimised 
somewhat But an Important purpose of performance testing is to provide prospective employers 
with Information about the abilities of applicants The invasion of privacy problems may be 
minimized if the students have to ^prove the dissemination of performance testing results. , 
Ultlmtely. however, the court may have to confront the question of whether there are llrylts \v the 
state's power to inquire about a Student s personal views and beliefs. It will do so by balancing 
the Invasion of privacy occasioned by the testing program against the state s purpose in 
implementihg the program 

4 Sfafo education clauses. Every state now has In its constitution a commitment to provide 
school-age residents with a free public education. About three-quarters of the clauses descr be. 
to some extent, the requireb education.^' These clauses may be relevant to. or the basis Of »^ 
variety of performance testing challenges." For example, the absence.of a pertormance test ng 
program might provoke a challenfte based on the state education clause. The argutnent coi Id 
proceed as follows in a statb witt> a "thorough *nd efficient" clause: The clause obligates the 
state to provide an educational program designed to equip students to function as citizens hnd 
as competltors' ln the labor market:" proficiency in vocational skills is essential tor those 
^)urpose8; establishment of a performance testing program is necessary to ensure that all 
students have an adequate opportunity to achieve such proficiency." 

State education clauses may also support challenges to particular performance testing 
programs. For instance, the leve|s at yvhich proficiency standards were set couJd b« challehged 
on the ground that they were not consistent with the state's obligation to provide a high Muallty 
or "thorough and efficient" education, especially if those education clause requirements had 
been construed to relate to the stjjdents' capacity actually to function in the poatsecondai y 
school work world. Challengers,Jn^g^t argue that the standards were too low; performanc i at 
those levels.would not perrpit students In fact to function adequately In employment. 

Another type of education clause challenge could be brought against aj)erforrrtance 
/program that required or permitted different standards to be established by different vor^ 
' schools. Some education clausiss expressly mandate a "general and uniform" system of ^ 
education for the state;" others have been Interpreted to require uniformity across distr^t Mnes 
Arouibly such clauses would be offended by a performance testing prograrrt th^t permitted a 
student's graduation or diploma to depend upon the district of residence or the school fittended. 
On the other hand, educational home role is a well-entrenched tradition in many states, 
Including, pai;irdoxjcally, some with uniformity clauses. 



:esting 
itionar 



LfeGAL t8&U£S 



Fintlly, i state education clause challenge might be directed at the inadequacies of remedial 
programs for students who fail lo demonstralo Ihoir proficiency If a stato has defined its 
educational mission to include pupil proficiency in vocational skills, then it must take reasonable 
stops to carry out that mission Effective remedial education, once student deficiencies have 
been Idtintlfled, Is an important element 

5. f^dBral anditMte education statutes. Although at the present trme there is no legislative 
parallel involving performance testing in vocational education to the minimum competency 
testing movement, statutes may be enacted which specifically provibe for performance testing In 
that event, the requirements of those statutes'may provide legal bases for challenging particular 
performance testing programs. A' possible line of legal argument is that the program, as 
implemented, does not,comport with the statutory requirements. Alleged noncompliance may 
take many forms, ranging from blatant failure to meet specific rec^uirements (e:g.. failing to 
institute testing by a^date specified in thQ statute) to more complex issues of qualitatively 
inadequate programmatic efforts (e.g., failing to provide educationally sufficient remedial 
programs for students who fall below the proficiency standards). Several legal challenges to 
minimum competency testing programs have raised these sorts of issues. For example, in one 
case, the challenge is based upon the schools system's alleged failure to comply with a specific 
statutory requirement to obtain parent, teacher, anq student participation in the formulation of 
the program." 

Other more general proVfsions of federal and state education laws may be relevant, too. For 
exampfe, there are statutes that provide guidelines for the operation of vocatlohal programs.** 
that bar racial or ethnic discriminaton in educatibn,^^^ that prQvide for certain access to pupil 
records,^^ that assure citizen participation in edvicational policy making and governance. and 
that regulate the education of special groups oi students.^ 

^ . 

Statutory challenges to performance testing efforts are likely to be narrower and focused on 
more specific asp^ts than constitutional challenges. By asserting a specific legislative standard, 
they will tend to reduce the court's concern about whether it may be substituting its judgment for 
that of another branch of government. . / 

6. Federal and state regulations Undm many of the statutes referred to above, the 
responsible administrative agency ^a8 prcmiulgated formal regulations or has jssued interpreta- 
tive guideines. tn some states education regulations formally promulgated by state education 
authorities have the force of law. They can form a direct basis foMegal challenges relating to 
performance testing programs in much me same way as statutes. Indeed, because regulations 
tend to.deal with educational progrdmsAin greater detail than do statutes, they may provide a 
stronger basis for leg#l action. The mp^eVpeciflcjind detailed the prescription by a legislature or 
state education body» the more limitea and mechanical the judicial intervention can be^ 

If, for example, state regulations provide in detail for a performance testing program 
pursuant to the authority of a more general statute, failure of the state or of ?he local vocational 
agency to implement that prograt^ fully can be challenged. Th?6ducationbl authorities may - 
defend by asserting that ddspitef/the specificity okfhe regulations they should be pe^rmitted some 
flexibility, or they mAy seek toymodify the regulations, or they may argue that the challengers 
have to exhaust available admfm^ All of these, however, are matters well within 

the traditional competehpy ordOurtt 4o resolve. 

In states where admlniyirative regulations are not given the force of law, or in the case of 
administrative action, suclyas guidelines or policy statements, not .having the status of formal 

' . ; * 7 ' ' 

• /''-'i.H ' 102 , 



requlatlo^ns. the substance of the'adminstrative judgement should s .11 have wo.gh m egal 
procoodinqs It reprosonts th« oxpmt v.ow of the stato s Hclucat.or.al author.t.os As such, n rourt 
|,K«ly would l.nd .1 highly -elovmit to Intorprotntion of hrom< ronmm.lonHl o. st,nuto,v 
provisiocvs. 

7 The -common law". The final source of law that may be influential in judicial consideration 
of a performance testing program is the "common law." Under the Anglo-American legal system 
mis is judg^ade law. Cour^ will tend to follow prior ,ud.cial dec.s.ons in similar cases under 
he doitnne cTstare decs.s. I,. co..f,onting a ,.ow cnso, thorcforo, n court will conn.der. along 
w.rrelevant oonSt.tutonal. statutory a-ui .ogulatory provisions, tho ,ud.c,al p.ocodont, ospocially 
cases decided m the same junsdictioH 

Many bodies of precedent are relevant to performance testing programs 
education For example, as indicAt^d previously, federal and state courts have dealt extensively 
^th and given content to. conditional concepts such as due process rights of students, equal 
protection aspects of pupil classification by testing, educational segregation, and equality of ^ 
rucat^^onaropportunity In many states related education statutes and regulations have been 
judi^fa rconTtm^^ Beyond those possibilities- the courts have established certain 'egj /-ghts 
Independent of constitutional, statutory or regulatory provisions. Thus students are entitled to be 
tested in a careful and appropriate manner by tf^ose who owe them a duty of care. School 
authorities which have failed to do so may be held liable for their negligence 

Applying the Legal Theories to Performance Testing 

Performance testing programs in vocational education may evolve in various ways. The 
differences in approach rn^^^^ based upon differing perceptions as to what are the best public 
anHdal polS^^^ program, administrative structure, use of available resourdes a,.d 

?elatk.nVh ps to the job market. The purpose of this paper is to urge that legal considerations 

shouTp ay a s gnificant role in the development of performance testing. As a point of 
departure I wil use Brickeirs seven keynotes as Ahmann has adapted them to Performance 
tstlng in vocational education. :^Ahmann also has added the "who" question, at each stage in the 
developmental process. Thus, the keynotes become: 

1 The skills and charjacteristics to be tested 

\ 

2- The means of measuring them 

3, The point(s) at which they will be measured 

4, The number of proficiency^standards Which will be set 
' 5. The level(s) at which these standards will be set 

6. Whether the standards will be for school programs or students 

7, The consequences of failing to achieve the standards 
'8. For all of- the above, who will make the decision. 



96 



103 ^ 



I rc.Ai issiirs 



The skills and chaLHCtoristrics to bo tosttui A r^uinbor of intorroliMocI questions aro raisod by 
thift keynote They Inclnrio the followinri Aro the skills Jir>r1 rhnrnrtoristirs doriveri from the 
substanco of the vocational educatioo sut)|0(:ts''^ An) th(>y dorivod fiorn spocific |ol)S (tfuougfi |0b 
analyses) to wtiich ttie vocational educatioo sul)|0(:!s artciolaled ^ Are they dorivod from 
categones of job? Are tt)ey derived from a broader idea of professional preparation, including 
Ahmanh's concepts of "occupational knowledge" and job-seeking skills?'" Are all the relevant 
skills and characteristics measured or a sanple of them? Aro values and attitudes to be 
included? Aro general competencies to bo included in tfio perfornianco test or are vocatiormi 
students required to take the separate minimum competency test used in the general educational 
program'? Who dotorniinos the skills and char actonstics to be testiul (e g , educators, eniployers 
or unions, strfdents. parents or other citi/ens. or some combination of those)'? 

Consideration of the legal implications of the various alternatives may influence the policy 
decisions. In general, the most relevant legal theories are the substantive due process concept of 
rationality, the'equal protection concept of nondiscrimination, thtf freedom of belief and privacy 
concepts, and the state constitutional, statutory, and regulatory requirements of a certain quality 
or quantum of education 

On one level, focusing on skills derived ^directly from vocational courses may comport easily 
with due process ancj equal protection concepts as long as: (1) the performance testing relates 
to 5ubjecrt matter that the students actually have had a reasonable opportunity tb master; and (2) 
the selection of subjects taught or phosen foi^the performance testing is nondiscriminatory (in 
the sense that it is not skewed in favor of particular socio-economic, racial, or ethnic groups). 

However, focusing on skills derived directly frQni vocational courses may pose greater legal 
difficulties under other concepts. State educational quality requirements, as well perhaps as 
substantive due process, may dictate that proficiency be defined in terms of skills actually '\ 
required in the marketplace. In theory, vocational courses, more than any other school subjects, 
should be related to the marketplace. But that may not always be the case. 

If the skills upon which the performance testing is based appear to be reasonably related to 
the job market in some sense, it is unlikely that a court will intervene because the skills are 
derived from categories of jobs rather than individual jobs, or from a broader idea of professional 
preparedness, or represent a sampling of relevant skills rather than all the skills involved. These 
are judgments about wrtich the judiciary will tend to defer to the education officials, assuming 
that there is credible evidence that the task has been approached responsibly. 

The courts are more likely tcTconsider intervention if the performance testing gives 
substantial weight to personal values, attitudes, and other characteristics in addition to, or 
Instead of. job-related skills. The risk of subjectivity and, ultimately, bias may be heightened by 
such an approach. Moreover, issues involving freedom of belief and privacy may be raised. 
Justifying the^mclUsion of such dements, therefore, is likely to be more complicated. On the 
other hand, if the educational authorities can demonstrate empirically that certain personal 
characteristics are closely related to successful performance on the job, they may be able to 
argue that the predictive. valic^ity of the perforhiance testing is linked to inclusion of such 
elements. The courts will have to balance any infringement upon students' interests against the 
weightipess of the state's purpose. 

The relationship between performahce testing in vocational education and minimum 
competency testing raises further Ifegal issfles under the state's educational quality provisions. 
Generally, courts that have construed the state's obligation under such p/ovisions have 
concluded that students have a right to an educational opportunity designed to equip them for 

• * • 

97 

104 



TRACfMNBt9iQ 



effective citizenship as well as for competition in the marketplace - Mastery of basic academic 
skills may be relevant to both. Iherofoio, students ii^ vocatiotml piograms will hnvo to bo 
included in Iho gonornl minimum rnmpot^nry tostinq pronrnm. as they are m most states. 

Finally who determines the skills and characteristics to be tested may have legal 
implications, (isrtainly. the requirements of rationality under due process and equal protection 
concepts must be satisfied. Statutes or regulations might specify the procedures to be used in 
creating the performance tests, and their mandates would have to be met^ Moreover, if the . 
decisions actually were made by persons or agencies not officially part of the governmental 
structure, issues of improper delegation of authority would be raised. 

Th9 means of measurement. Brickell suggested four broad choices for measurement of 
student competencies that are applicable to performance testing in vocational educat on. (1) 
actual performance In job situations; (2) simulated performance in situations resembling the job; 
(3^ performance in school programs; and (4) performance on paper-and-pencll tests. 

The touchstone for evaluating these alternatives is the concept of validity." Under both due 
process and equal protection doctrine, tests, of whatever type, must satisfy standards of 
objectivity, reliability, and validity." Due process is implicated if the use to be made of the test 
threatens to deprive students of their rights to liberty or property. Evidence that the use ofthe 
test stigmatizes students who fail to demonstrate their competence or requires their attendance 
at rernedlal programs will be germane to An alleged deprivation of their liberty Interest. Denial 
of promotion or graduation based on the test results istie clearest support for deprivation of a 
property interest." Eviri if a court could be persuaded thpt some students had been deprh/ed of 
their liberty or property rights, the students still would have to prove that the test or related 
procedures were not procedurally or substantively fair. 

An equal protection challenge would progeed mblit forcefully If a suspect classlflcaWon were 
evident. At one point, a test's racially disproportionate effect-a far higher percentage of l)lack 
than white students falling belov# proficiency levels-established a^rima facie case of racial 
discrimination sufficient to shift a heavy burden of justification to the education authorities. 
3evefal years ago. howev^. the United States Supreme Court determined that an Intent o 
discriminate, rather than merely^ discriminatory effect, had to .be proven inbrder to establish a 
racial classification.** An intent to discriminate can be proven by circumstantial evidence, 
including ttatlstlcal data, as Well as by direct evidence.-^ It is still not clear, however, how heavy 
a burden that will place upon challengers of a-performance testing program. 

If detpite racially disproportionate consequences, no suspect classification can be 
•itabl'lshed, antMf the federal courts adhere to the view that education is not a fundamental 
Interest, then the classification of vocatlor^al students into Ihose who ^-^e achieved proficle^^ 
•nd Those who have not can be justified by showing that it has a "rational b"l» " The validity of 
the testing InatrtJment wni ttlll be part of the showing of ratlonalfty buUhe overall burden on the 
school authorities will be substantially lighter than under a stricter scpOtiny approach. Tha is • 
etpecltUy true given the recent tendency of the federal courts to defdr increasingly to public 
" otflc|alt^udgmentt.« ^ " 

Hpwtver. eveQ If the performance testing is found to be racially neutral It may still be 
invandat«d if the state or local educational system previously was found to discriminate against 
atudentt and the effect of the testing Is to perpetuate the effects of past discrimination. The 
fodenl district court In DebrM P. found this to be the case with the Florida minimum competency 
. testing program. Instead of Invalidating the program, though, the Court merely deferred 
effectiveness of the diploma sanction, 



96 



r 



LEGAL issues 



The Debra P. court also dealt with many claims of test invalidity. In one of the weaker 

portions of the opinion, it concluded that although the test was flawed in ruany respects the 
inadequacies did not rise to the level of "constitutional infirmities This mny surjgost thnt if 
educational authorities can present evidence that they have attempted to deal with test validity 
concerns their efforts will not be struck down because they have fallen somewhat short of the 
"state of the art." 

Applyioii these legal principles to the broad choices outlined by Bnckek indica^es^that, in a 
general ^ense. paper-and-penci! tests may be more easily defended than the other alternatives. s * 

Although it may be more difficult to establish their predictive or face validity, the courts have not 
usually required sych validity. 'Paper-and-penci! teats may be easier to validate in content or ( 
construct terms, and this is the direction of the court's primary focus. Moreover, paper-and- 
pencil tests may minimize the more obvious problems relating to objectivity and reliability that 
could pla{(ue tests based on actual or simulated»performance.«' Ahmann htfs described the 
difficulties, in terms of resources and personnef capability, that would have to be surmounted to 
deVelop and administer effective tests of actual or simulated pertormance. The courts will have 
little difficulty strtking'-^own a Jerry-built pertormance test. This is not to suggest that, being the 
avenue of least legal resistance, paper-and-pencil tests's^^ould automatically be adopted. It does, 
however, reflect one of the realities that must enter into the decision. -/ 

The points for^moasurement. The purpose or purposes of the performance testing will 
determine, to a ^ubstantiial degree, when |||e testing is carried out. The testing may serve a 
/^creening function for entry into a particular vocational program." In that event, of course, the ^ 
test would be given prtor to entry into the prograTh. If, on the other hand, the performance 
testing serves a certification function, it may be. administered, at or near the end of the vocational 
program. Finally, if the purpose is diagnostic and remedial for individual students, programs, or 
both, the testing will be administered pertodically during the course of the program. 

These purposes are not rrjutually exclusive. The chorfce of testing purpose and the related 
decision about points for measurement will be influenced by legal.considerations. If entfy into a 
vocational program is at issiiS, and the screening will disproportionately affect particular groups 
of students, equal protection questions will be raised. Due process questions may also be raised 



• about whether the performance testing is an arbitrary means of screening individual students. 



Central to both sets of questions are the validity of the particular pertormance test, 
discussed in the prior sectiori-, and the intention of the responsible education officials. In the 
latter connection, vocational educational professionaj^ay have to deal with the argument that 
they attempt to lirnit entry into their progi^ams to students who will be easiest to place in Jobs. 
CrtticS have asserted that this had led to discrtmination against black, non-English speaking and 
handicapped students.^^ 

Similar legal Issues' will be raised If the purpose of the performance testing is certifi'catlon. 
The sanction there may be with holding of prorftotion, graduation or a "regular" diploma, or 
Identification- of students as "lacking proficiency." The effect in any case, may be ineligibility for. 
or reduced access to, future educational or employment opportunities. 

Because of the weightinest of these consequences, the vocational education authorities' 
Justification Is likely to be subjected to careful scrutiny. This will include attention to the timjng 
of the meMurement. There should be adequate notice of the pertormance expectations an;d 
sufficlentjime and opportunity for students to meet them. A court that considered these matters 
also probably would require testing earty enough to permit remedial efforts for students found to 
lack the necessary performance skills." 




TRACTENBCfiQ 



If performance testing were for diagnostic and remedial purposes only, the burden of 
justification would be lightest. When and how frequontly tho tost.iui was adrninistorod would bo 
lull to tho disctotioii of tho oduration nffirinir,. iin!oR<^ that discretion was ex«rr,is«d in a 
manifestly arbitrary or irrational way and sojne tangible harm to students could bo provon In this 
connection, the state education clause's quality standard might bec6me relevant. Students might 
argue that the harm to them was that the program as structured could not provide them with 
adequate diagnostic and remedial efforts. 

. The numbe, of pioficiency standaids sot. Thoie is a considorablo rango of policy possiblities 
concerning this matter. There could be a single statewide standard for all students in a particular 
type of vocational program, or there could be a separate standard for each student based upon 
perceived abilities, background and educational objectives. Between those poles are other 
possibilities-multiple statewide standards categorizing students by one or more of a number of 
possible criteria (i.e.. demonstrated or projected intelligence, facility with English, existence of a 
handicap socioeconomic background, nature of the particular school or school district and the 
community and job market that it serves, and the educational expenditure level); either single or 
multiple standards established region by region, district by district, or school by school for 
students within those respective jurisdictions; a combination of one or more statewide standards 
augmented by additional and perhaps higher standards established locally. 

Various educational and policy problems are posed by these alternatives. For example, a 
single statewide standard for all students in a particular vocational program may be seen as both 
too difficult and too easy given wide variations in student ability and performance and. perhaps, 
in the varying demands of the marketplace. Differential standards require that each students 
capacity be estimated, with the dual problems of the possible subjectivity of such estimates and 
the self-fulfilling prophecy phenomenon. 

> 

- Moreover if testing were designed to certify that students had achieved adequate profipiency 
to perform in the marketplace, such differentiation would deprive the certification of uniform 
meaning'even at the lower end of the scale. 

These sorts of educational and policy problems have legal analogs, A single statewide 
proficiency standard could be challenged on a number of grounds. If it failed to relate «> 
adequately to the demands of the job market, it could be challenged for lack of conformity with 
the state's educational quality responsibilities, or for its arbitrariness under due process notions. 
If the consequence of a single sMftdwide proficiency standard had a sharply diffA-ent impact on 
groups of Students, especially those defined by race or ethnicity, an equal protection challenge 
might be forthcoming. 

Resorting to multiple "Standards would not necessarily eliminate these legal concerns. 
Illustratively, if performance expectations for mirlority students were consistently and 
substantially reduced, although those students might be "certified." sych an approach could 
stigmatize them, lower the program's expectations for them, and deny them ajcess to remedial 
programs designed to elevate their proficiency levels. The consequence of these factors might 
actually be to diminish thA» job prospects of minority graduates of vocational education 
programs. 

Differential standards could also raise substantial due process issues regarding the . 
arbitrariness oMrrtftionality of the standards themselves and of,the mechanism by which they 
were set Thestrength of, this challenge would depend upon the care exercised by the 
responsible education authorities. If. for example, standards were established for each pupil by 



160 • 'J 

IO7 



Legal issues 



an individual teacher acting impressionistically rather than on the basis of care(ut1y articulated 
criteria, the system woOld bo very vulnorablo 

The level at which proficiency standards are sot A quoation relnted also to the nurnbor of 
standards set is whether standards ostensibly established to reflect the demand? of the ^ 
marketplace are for entry or }ourneymen«level positions. As a practioi^l matter, unless a particular 
program is specifically designed to equip its students for journeymen positions, the .standards 
should be geared to entry level positions. The more important issue is likely to be whether the 
standards actually relate to the marketplace. 

There is evidence that in many vocational programs, instruction may not be effectively 
geared to the job market.^^ If that tendency were extended to performance testing standards.^^ 
there would be clear policy and legal problems. The performance testing effort could be attacked 
on the due process ground that it was not rationally related to the State's avowed purpose of 
equipping students to compete in the job market Moreover, if the level at which standards were 
set did not comport with the marketplace, a state education clause challenge might lie. Finally, 
standard setting raises the issue of who makes the operative decision. Jt is inconceivable that 
standards could reasonably relate to the demands of the job market without the standard-setting 
process substantially involving representatives of the market in question. Nonetheless, trom a 
legal perspective, the ultimate decision must be made bj^the responsible public officials 
Otherwise, the standards are subject to challenge on the basis of an unlawful delegation of 
authority. . . ' 

Whether the standards will be for school programs or for students. Thus far, this paper has 
proceeded primarHy on the assumption that performance testing standards will be established for 
students rather than for school programs. This orientation is not inevitable. A performance 
testing program might be established to determine how well vocational schools or progranis are 
performing on the whole. 

The practical differences between these two approaches are substantial. As Brickell pointed 
out in connection with minimum competency testing, the. choice between them will determine: 

. . wh^sth^Tydu will^write test items all students can pass or only most students can 
pass; whether you will test everybody or only a sample; whether you will report results 
to each individual parent or only to thejgsneral public; whether you will settle for a 
school program that reaches 70% of the students even if that 70% misses, for example, 
every single 'disadvantaged' child; and whether you will modify every unsatisfactory 
program or fail and recycle every unsatisfactory graduate."'^* * 

A focus on schools and their programs will reduce some legHt difficulties but may increase 
others. To the extent that such a focus would reduce or eliminate sanctions against individual 
students or groups of students (i.e,. by not denying them promotion, graduation, or regular 
diplomas, Or-by got publicly identifying them as befow proficiency leveisji, due process and equal 
protection concerns would be lessened. Arguments based on det^rivation of a liberty or property 
interest, or on indivious discriminiltibn, would be far less credible. The thrust of performance 
testing wduld b^ on school or program itbcountability and thd response to inadequate 
performance presumably would be a programmatic or person nel-orie>^ed response. 

.< ' • 

That may be a rational and appropriate approach unless the state's constitution, statutes or 
cegulations impose a clear educational quality requirement directed to the rights of each student. 
In that event, as previously discussed, a performance testing effort, wjiich was not designed to 



101 



108 



TRACTENBEBG 



ensum that each student had an educational opportunity glared to the fc^'evement of 
reflsohable' proficiency in job-related vocational skilla. woUld be suspect. Failure of the program 
to lend to special educational assistance for individual students who fell baiow the specified 
standards would be the cjearesl indication of its invalidity. 

The consequences of failhg to achieve the standards. THis final keynote follows dfectly 
from thd pri(?r.d)8CU8tlori. In cbnnection -with minimum *Dmpetency testing Brickel I sugges^ 
six po8«ribl# coofeq'uence^ for students who fell^be^w minimum competencies and six parallel 
conser^iences for schools whose students failed to per(orm adequately. They were: 

1. Verify the findings independently 

2. Provi 3 several more chances " : , . 

3. Lower the standards to n^eet their performance 

4. Remediate so that they can pass (or redesign school programs to m^tch successful 
programs) . 

5. Refuse to promote or graduate them (or refuse to let schools operate until they can meet 
the standards) . , t 

6. Promote or graduate them with a restricted diplomf or qertificate or attendance (or let 
schools operate but refuse to accredit them.)" 

In applying these possibilities to performance testing in vocational education, the prior 
discussion made clear that the p>*ferable. and In some states the required, response to evidence 
that particular students have failed to meet proficiency stan<|ards is to direct ^PProprMe 
educational assistance to them. This may take the form of remediation for the individual 
students it may also involve broader programmatic or personnel responses. Surely it a 
substantia percentage of the school s or program's students is failing to meet statewide or local 
S?ds the overaM educational program, including.the quality of instructional staff, should be 
evaluated and perhaps upgraded. 

Lowering the performance testing standards because 'Itoo many" students have failed to 
meet them" is an unacceptable response for both public policy and legal reasons. 

If students who fait to meet the standards are pro^^ided virtth appropriate remedial assistance 
and if the program is otherwise fair and rational." then ultimately they could be "-^^^sed 
Drotnotion or grfl^duatlon. or be promoted or graduated with a restricted diploma or certificate of 
Kdin^li From adue procesa'perspective. these students may have been deprived of a liberty 
or property interest by that action but the state is permitted to do so K it acts fair y and 
rationally. From an educational qMality perspective. the state cannot toe required to guarantee 
Educational results for all students. It cah be held, however, to provide an appropriate 
educational opportunity for all students. 

Vocitlonal educational results, as measured by an effective performance testing program 
are relevant to a determlnktlon of whether that educational opportunity is appropriate n legal 
?e7m7.t;rce onnadequite pupil performance "should shift to the ef "^^^^^ 
burden of demonstrating that, nonetheless, they have been providing their students with 
appropriate educational opportunities. This .^^ult is consistent with sound public policy and with 
the discharge by educators of their professional responsibllties. 

' 102 . 



LEGAL ISSUES 

* 



^ Future Dovotopments .^^ 

The minimum competency movement has generated extensive debate and controversy Its 
future is uncertain. Part of the uncertainty arises because pending and future legal challenges 
mayinvalidate entire programs or certain aspects of them. The Debra P. decision, the first 
j»lating to a direct minimum competency challenge, has not resolved the matter; Indeed, it may 
Tave .heightened the uncertainty by providing ostensible support for both supporters and 
challengers of minimum^ competency testing 

Uncertainty about minimum competency testing extends beyond the legal arena, however. 
Educators and policy makers are^ivided about the likely effects of these efforts. Whether the 
movement will improve education and educational outcomes by promoting more responsible and 
effective teaching, administering, and studying, or will victimize those who are held accountable 
by it, cannot be determined yet. In substantial part, the answer to that crucial question will turn 
upon the quality of further policy making that can shape or reshape minimum competency 
programs. It will also depend upon the care and skill exercised in implementing the policy 
thrusts. 

The evolution of performance testing in vocational education hopefully should benefit from 
this experience in minimum competency testing^ There are sufficient parallels to rhake this a 
reasonable possibility. What is required of polity makers and practitiohers in vocational 
education is that they neither uncritically adopt performance testing as a solution to all their 
problems, nor reject it out of hand because it will have to be developed and implemented with 
thoughtfulness and care. 

> Legal principles, and tfje threat or actuality of litigation, may come to play an important role 
in the evolution of performance testing programs, too. This role, it is hoped/ will be a positive 
one, requiring rationality, fairness and objectivity of the process, but not making impossible 
demands. But, vocational educators should not simply sit back and wait to be sued. They should 
deal in some preventive maintenance— they should attempt to head ofi legal challeriges by 
fashioning and implementing performanfi0 testing programs in the most ca^reful mannel^ possible. 

fif they do so. the law and the courts will nave been an important partner in educational and 
professional reform. 



103 

iio 



TRACTENBERQ 



« 

Note* 



'See. e g . Ralph D. Turlington, "Good News from Florida: Our Minimum Competency Program is 
Working. " Phi Delta Kappeth 60 (MayJ979): pp. 649-51. 

«See. e'.g.. MerleS. McClung. -Oompetency Testing Programs: Legal and Educational Issues." 
FoKi L. Review. 47 0979): 651-712. Donald W. Lewis. "Certifying Functional Literacy 
Competency Testing and Implications for Due Process and Equal Educational Opportunity." 
Journal of Law apd Education. 8 (April 1979): 145-83. 

'See Chris PIpho. "Miolmum Competency Will Disappear But Other Controls will Remairi. '.P/)/ 
Delta Kappan 60 (February 1979): p. 412. 



^Shirley B. isteil, "A Summary of Issues in the Minimum Competency Movement," Phi Delta 
Kappan 60 (Mbcuary 1979): pp. 462-53. 

Webra P v Turlington, 474 F. Supp. 244 (M P. Fla. 1979). Another minimum competency 
challenge, dir^en v. Hunt. Civil No.78-539-Clv.-5 (E D. N. Car. April 4. 1979). was dismissed for 
procedural reasons. 

*Hernand9Z v. Board of Education, Lynwood Unified SchoJl District, Case No. SCC 01531 
(Super. Ct. Los Angeles Co.. filed May ^. 1979); Wella v. Banks, Civil No. CV 478-138 (S.D. Qa.. 
filed June 17, ld78). 

^See, e.g., P*ul L. Tractenberg and Elaine Jacoby. "Pupil Testing: A Legal View." Phi Delta 
Kappan 59 (Dec. 1977): pp. 249-54. 

•E.g.. James v. Board of Education, City of New York, 42 N.Y. ad 357. 397 N.Y.S. 2d 934. 366 N.E. 
2d 1291 (Ct. App. 1977) (court refuses to Intervene In determining whether integrity of citywide 
reading tests had been compromised). ' 

•E g Chappell v. Commissioner of Education, 135 N.J. 9uper. 566, 343 A. 2d 811 (App. DIv. 
1975) (court refuses to Intervene In "fundamental educational policy" decisions to Irtltlate pupil 
testing and to disseminate results). v 

'«Eo.. Larry P. v. Riles, 343 F. Supp. 1306 (N.D. CaL 1»72), affdr^2 F. 2d 963*(9th.Clr. 1974) 
(cultural bias df.l.Q. test^ against black children). The court recently reaffirmed that decision. 

^'Henry M. BrlcKeJI. "Seven Kfey Notts 6n Minimum Competency Testing," -P/?/ Delta Kappan 59 
(May 1978): pp. 589-92. 

"J. Stanley Ahmann, "Implications of th^ Minimum Competency Testing Movement for 
Performance Testing In Vocational Education." An unpublished paper, 1979. 

»»The term "state" Includes not only state governnrient but also other state and local ^ 
governmental bodies. Including school districts. 



104 



I 



LEGAL ISSUES 



1 '"^^ 

state laws asTic 



'^The U.S. Supreme Court, until 1937. struck down numerous state laws as hot having a "real and 
substantlaf relationship" to permissible state purposes and. therefore, an being violative of 
"substantive due process." The tnost notorious e?(ample of such gctivity was Lochner v. Now 
Vork, »9tt U S 4b (1»0b), where the Court struck down New Yorks maximum hour legislation for 
bakery employoea. The breakthrough case where the Court applied the now common and more 
relaxed "rational basis" test was West Coast Hotel Co v. Parrlsh, 300 U S 379 (1937). Since that 
time, other than in cases dealing with civil rights and cjvlLliberties. virtually no state law has 
been Invalidated by the Supreme Court as being violative of "substantive due process" 

'*Merle S McClung, "Competency Testing Potential for Discrimination." Clearinghouse Review 
11 (1977): pp. 439-48. 

'•E.g.. Qoss V, Lopez. 419 U.S. 565 (1975). , 

"Stigmatlzation. as infringing on a protected liberty interest was recognized in Wisconsin v. 
Constantineau, 400 U.S. 433 (1971). See also Board of Regents v. Roth, 40Q^U.S. 564 (1972)f Paul 
L.Tractenberg, "Selecting 'Educationally Deprived' Students for Title I; A Review of tfie Legal 
Issues." (an unpublisfied paper prepared for tfie National Institute of Education, 1977): 59-62. 
However, in Paul v. Davis, 424 U.S. 693 (1976), the Uniterf States Supreme Court narrowed the 
definition of stigmatization to require Xhb "alteration of legal statuf which, combined with Ihe 
injury from defamation^ justified the invocation of procedural safeguards." 424 U.S. at 708-09. 

'"See. e.g., Rizzo v. Qoode, 423 U.S. 362 (1976); William J. Brennan, "Address to the New Jersey 
Bar," May 22, 1976 (reprinted in Guild Practitioner 33 (1976): pp. 152-68 ] / 

'•See. e.g.. People v. Brisendine. 13 Cal. 3d 528, 53i P. 2d 1099, 119 Cal. Rptr. 315 (1975). 

'^'For a comptency test to meet technical requirements, it must be shown that it is both valid and 
reliable. Validity refers to whether the test actually measures the characteristic that it claims to 
measure. Reliabiiity refers to whether the test measures that characteristic accurately and ^ 
consistently. In the case di competency testing an invalid reading test might actually be 
measuring writing skills. An unreliable reading test might give a student who took the test twice, 
using two different forms of it, a high score when he or she used form A and a low score when 
he or she used form B. See American Psychological Association, Standards for Educational and 
Psychological Tests (1974). , 

''This also could be the basis for a due process challenge— namely, that the state was acting 
irrationally. See, e.g., Arthur Wise, ^'Minimum Educational Adequacy: Beyond School Finance 
Refprm," Journal of Education Finance 1 (Spring 1076): 468-83; Joan Baratz, "In Setting Minimal 
Standards M|ive We Abandoned Concern« for Equity and Accets," Paper presented at 
WingspVead Conference, Educational Policy Research Institute. Washington, D.C., July 1978. See 
also Phi Delta Kappan 59 (May 1979), whi<?h contains a series of articles on minimum 
competency testing. 

"San Antonio^ Independent School Dist. v. Rodriguez, 411 U.6. 1 (1973). 

^See. e.g.ys^rrano-v. Pr/e«f, 18 Cal. 3d 728, 557 P. 2d 929, 135 Cal. Rptr. (1977); Norton v. 
.Mes/f///, 172 (Donn.'615, 376 A. 2d 359 (1977). 



105 



ERIC \ • l 1 • 



TRACTENBERG 



"In Wihington v. Davia, 426 U.S. 229 (1976). the Court held that disproportionate racial impact 
of a to9t Is Insufficient to establish an iinconstitutlonal racial classification; a discriminatory 
purpose must be shown. Several subsequent Supreme Court decisions shed light on how that 
purpose may be shown. See. e.g.. ViUttyi) at Aihnylon //o/y/i/i v. Motio^jolitoii^ Houi>inij 
D*V9lopm9nt Corp., 429 U.S. 252 (1977). In light of this narrowing construction ot tho tKiuHl 
protection claust, challengefc^bwed upon Title VI of the Civil Rights Act of 1964 and its 
Implementing regulations may be preferable, thb U.S. Supreme Court indicated In Washington v. 
Davis that disproportionate racial inlpact of a test might be sufficient to constitute violation of 
• Title VI. See McClung. supra, n. 15. at 442. 

"See. e.g..,Skvann v Chkriotte-Mecklenburg Board of Educmtion. 402 U S 1 (1971). 

'•See. ^laconain v. Yoder, 406 U.S. 205 (1972); Tinl(0r v. Des Moines Independent School 
District, 391 U.S. 502 (1969); West Virginia State Board of Education v. Barnette. 319 U.S. 624 
(1943). See generally MCCIung, supra n. 2; at 674-77. 

"The education cjauses use a variety of formulations. Among the more common descriptions of _ 
the required educational quaUty.are the following; (i) "thorough and efficient" (e.g.. N J Const, 
art VIII §4. -1; Ohio Const, art. VI. §2; Pa. Const, art. III. §14. W, Va. Const, art. XII. fl): CO 
"high qujllty" (e-g.. III. Const, art. X. §1; Mont. Const. Ail. X §1(3); Va. Const, art. VIII. §1);4lil) 
•Jj^eral and unltbrm" (e.g.. Ariz: Const, art. W. §1; Idaho ponst. art. IX. §1; Ind, Const, art. yill. 

"See Paul L. Tf^ctenberg. "Legal tmpllcatio'ns of Statewide Pupil Performance Standards." Paper 
pret>ared for the Education Commission of the States.' September 1977. 

«ln Robinson v. Cahill, 62 N.J. 473. 303 A.2d 273 (1973). the New Jer|»y Suj/reme'Court 
Interpreted the state's "thorough and efficient" fclause In that manner. ^ ' 

. ■ ' 

»Thl8 Is likely to be the most difficult link to eatabllish. A performance testing program , , 
undeniably Is a.ratlonaf way for thd state to ImpleiKDnt Its educational obligatlprt. But the state 
Will maintain that there are other rational ways available to It. 

• . ■ -, • 

"This appronch would raise formidable proof problems and the challengers would have to 
overcom» a court'i^ tendency to defer to the expertise of leglslatdrs or educators who have set 
the standards. . . ' 

"Sea n. 27 supra. 

^Hernand^i v. Board of Education, Lynwood Unified School District, sypra. n. 6. 

ME.g., P.L. 94-482. §112 (1976). ' , 

• * ' I ■ / ■ 

«E,d:. Title VI of the Civil Rights Act Of 1964, 42 U.S.C- §2000d (1976); Equal Educatlon^al 
Of»portunlty Act of 1974, 20 u s e. §§1791-1758 (1976). 

••Ftmlly Educttional Wghts and Privacy ^t of 1974, 20 U.S.C. §1232(g) (1976). P L. 90-247. as 
added P.L. 93-3800 and amended P.L. 93-568. Implementing regulations are at 45 C.F.R. 99.1 et 
seq. 

N.J.S.A. 18A:7A-2(a) (6).*(6), (7).' 



"E.g., Education for .all Handicapped C+illdren Act of 1975 (P L, 94-142), 20 U S C. §§1401-1461 
(1976) Imp1*fnentlng r«gtrt«ftlon« ar« at 4f) C F R §12ln 1 754 (1>)78) 

'^bducationaf malpractice cases are probaWy the best known lawsuits regarding pupil 
performance. Tfioae cases are based primarily on common law negligence theories. The 
' asseirtlon Is that students have failed to learn because the schools and their professional staffs 
have breached a duty of care and skill owed to the students. Thus far.educatlonal malpractice, 
cases on behalf of "rtormal" students have been unsuccessful because of the courts' public 
policy concerns about Impoaing such liability on school systems and professionals See. e g . 
Pefer W. v. San Francisco Unified School District, 460 Cal App 3d 814, 131 Cal. Rptr. 854 (Ct. 
App 1976): Donohue v. Coplague School District, 64 A D. 2d 29, 407 N.Y.S. 2d 375. 391 N.E 2d 
1352 (Ct. App. 1979). Cases brought on behalf of handicapped students alleging particular 
neglli^ent acts of speclf|ed professionals, rather than aNgeneral pattern of negligence, have been 
more successful. See, 91^,^ Hoffman v. Board of Education. City of Naw York. 64 A.D. 2d 369. 410 
N Y.S 2d 99 (App. Div. 1978). Recently, however, the New York Court of Appeals reversed the 
Hoffman decision on public policj^ grounds. Although the results of performance testing In 
vocational education /night highlight Inadequate performance ofsome students, those results are 
unDikely to cause the judiciary to depart substantially from the policy approach it has staked out. 
See generally Note. "Implications of Minimum Competency Legislation: A Legal Duty of 
Care."Pa.c. UwJourna/ 10 (1979): 947-70. " ' ^« / 

^»See Ahrtiann. supla. n. 12. at 8-11. 



j^( 



*'See, e.g.. Roblnsoh v. Cahill, 62 N,J^473. 303 A. 2d UB (1973). 

"Va«jllty has both a generalized meaning of suitability and appropriateness, and'a technical 
psychometric meaning. As to the latterj see n. 2J supra. 

*'See n^20 supra, • ^ 

**See n. 17 supr^. 

^*See n. 16 supra. ... > . \ . 

^See n. 24 supra*. 

"\n Village of Arlington Heights v. Metropolitah Housing Davelopmerit Corp.. 429 U.S. «52 \ 
(1977), the Court listed a nuniber of factors that may be Considered in establishing 
discriminatory frltent. TheseHncluded: (1) historical background; (2) the specific segMbnc.e of 
events leading up to the challenged decision; (3) the departures from normal procedural 
sequences or typical substantive results; and (4> the legislative or administrative histo^ • 

J ; 

^•See.'e.g., Ingraham v. Wright. 430 U.S. 651 (1977); Rizzo v. Qoode, 426 U.S 362 (1976). See also 
Tract$nberg. supra n. 7, at 13, 

«474 f . 6upp. n. 23, at 261. * , 

"SeeiAhmann, supra n. 12, at 19. . ' - - 

•'The 'focus of this performance testing probably will be whether the student has adequately 
mastered certain foundation or prerequisite skills. 



V ill 



TRACTENBERG 

- - y 

"Dl^na Pullln's paper deals with this Issue In more detail 

"The court In Debra P deferred the Horlda diploma sanction lo. fou. yoacfi .^^ 
Jult h«U ade^ute nut..o of. o. opportunity to propnro for, ^^'^ '7;,,;^:;;^; 1"'^" 

«ddre»eed the unavailaliility of moanlngful rem^idlnl progrmm ur.t.l shortly t,oforo the sanction 

attached;* * / 

"See New York Times, October 16. 1979. §C. at 1. Col. I. 

^**lf the performance testing standards were related to tho morkotpla^'c^ but the '"f ^fj;;;"^ 
provl^edTn the program was not. there would be a mismatch between course and test content 
This would raise Issues ot substantive due process. 

• ««S«e enckell. "Seven Key Notes. ' p. 592. 

*Mbld. 

."Evaluation instruments, and perhaps the performance testing *»«"d«^^»J^«'^f t*"^?" '^ntt 
modified If. based on field testing or otherwise, vrilld educational or Psychometric judg^^^^^^ 

indicate that modV^catton Is Required to ^"^P^^^^^^^^/^'^''}^^^^ 

erected, however, tborevent this from being an opSn door to dilution of standards. If standards 
weS!^wered so IhiTth^y no longer vtere reasonably related to the demands of citizenship and 
the job market, they caM be challenged on legal theories discussed previously. 

'•Some of the prImaKf elements of a fair and rational system are: (I) carefully developed non- 
dls^iTmlna o^ s tandarda: (II) valid evaluation Instruments and procedures: (III) an opportunity for 
ScM^^^^^^^^ initial eJaluatlpn results; and (Iv) evaluation ^^^^^ l^lZ'^llZT^Z^i^ ' 
assistance (or program redesign) aruJ re^valuatlon. Some commentators have also «"00f»;«d 
fhT^st^O P^^^^^^^^^^ Should be phased In so that students' who ►^•-^"^•^•"^'•"y ^ 
educallonal process do not have new and onerous standards Imposed upon them. See f^Clun^. 

"Competency Testing." p. 2. 

: ' I , ■ ' 



PULLIN 



Accountability Through Performance Testing 

Performance testing instruments and techniques are designed to foster accountability wrthin 
the vCjCational education system I he i>se ot performance testing tor accountability raises legal 
•onc^rns on behalf of t>oXh students and the vocational educators conducting the testing 
program. In both cases, the importance of the legal considerations will be (^ectly related to the 
extent of the harm resulting from the use of 'the tests. In some Instances, the^gal Issues for 
^dilcators and for students will overlap. 

1 -\ » 

iWhIle this paper will focus tor the most part on legal iMues arising from harm to students 
from a performance testing program, it is helpful to enurrWrate the legal impact on educators 
themselves. Performance testing Is initiated to foster accauntabljfty in vocational education, but 
that accountability can be deslgi^ed to diagnose weakness and provide effective feedback for 
Change or to diagnose weakness and eliminate that weakness. Results of student performance 
tests can be used to evaluate and guide teacher or prograni effectiveness, or tests can be used to' 
assist teacher iermlnatlon decisions. The former situation raises few legal issues; the latter 
presents issues that have t>een addressed previously by the judiciary. 

The most striking example of the use of student tests fOr teacher accountability Involved the 
termination of an elementary teacher due. In large part, to the performance of her students on 
standardized achievement tests, the Iowa Tests of Basic Skills and the Iowa Tests of^Edugatlonal 
Development. While the trial court found that the dismissed teacher should be reinstated, an 
appellate court (disagreed and upheld the teacher's dismissal. The appellate court noted a dispute 
among educators about the reasonableness of using the tests to assess teacher competence but 
found that the action of the*scbool board and. the superintendent in the dismissal was 
reasonabld.^ It is not unreasonable to expect that a court might have the same reaction to the 
use of vocational performance tests for teacher termination. 

A court's analysis of the legalityj>f the use of performance testing to evaluate and terminate 
teachers rests in large part upon an examination of whether the scheme complies with 
constitutional guarantees of due process of law or fundamental fairness. 

Fundamental Fairness 

% An area vvhere educational and legal policy questions most closely coincide concerns the 
fundamental fairness of performance testing programs. Within the legal system, this issue is 
addressed by asseflfsing whether the program meets constitutional standards of due process of 
law. This issue is addressed by assessing whether the testing program is defigned to serve a 
necessary and legitimate governmehtal purpose and Is formulated to serve that purpose through 
reasonable means. WIttjin the educational system, this issue is addressed by assessing whether a 
testing program serves the Mucational goals and objectives of the schools. 

Traditionally, constitutional guarantees of due process of law insure that individuals are 
treated with fairness, consistency, and lack of arbitrariness by governmental agencies and 
employees. Due process protections are of two types: procedural and substantive. Proceducal 
due procass protections seek to Insure that the procedures used by government in dealing With 
individuals are fair. Procedural due process protections typically include the right to some form 
of notification of impending governntental action and the right to effectively influence or 
participate in governmental decision^maklng through hearings, representation by counsel, review 
of evidence, and so forth. Substantive due process seeks, to ensure that, regardless of the 



110 



\ 



LFGAL issues 



procedures followed, the action undertaken by the government must be reasonable and must 
serve a legitimate qovenmental objective or purpose. 

Due piouusi^. boUi subiila»ilivu and prot.uduial, is an olai>lit. t.onciipl ruquinny dilluicnl luvols 
of protection depending on the context. Tho procedural protoctions that must bo afforded a 
defendant in a criminal trial are mvch more detailed than those that must be provided a student 
who faces a long-term suspension from school. Similarly, the governmental objective* to be 
served by a statute regulating conduct through criminal sanctions will be subject to a much 
Stricter substantive due process analysis than the objectives.of a statute regulatihg the dress and 
appearance of police officers While the meanmg of due process, the delineation between 
substantive lind procedural due process, and the standards for determining what process is due 
in a particular situation can be somewhat blurred. However, there are guidelines offcrred 
educational decision-makers and vocational educators by a due process analysis of performance 
testing schemes. 

4. 

A substantive due process analysis ordinarily begins with an examination of the legitimacy 6f 
the goat df the governmental program. This an£ilysis of tNe "state interest" in a program can 
rarely be conducted by referring to a full and clearly articulated statement by the governmental 
agency made at the time the program was initiated; such statements seldom exist. liD^te^iJ, a 
court relies upon the government's after-the-fact rationale for its program or t?\e co^rt (tself 
defines whtft it feels a legitimate interest or goal might be. A substantive due process analysis 
therefore begins with scrutiny of the goals, either explicit or implied, of a testing program. Next, 
if the governmental goals are legitirrlate (and courts almost always find that they are), the means 
of achieving the goal will be examined. ^ 



two^sxamples of judges* use of substantive due process to anMyze educational practices i 
may be hdlpful. Both sitliatlons involved scho6l discipline: and theJIxclusion of studentf frorn' 
school for alleged violations of school rules of conduct. In the fJrst ci^sd.^^ New Hampshire high 
school student was indefipitely expelled from school for intoxication. Laws of the State of New 
l^ampshirp permitted expuleion of students for "gross misconduct;" school rules specified that 
students could be Expelled ;f or ''undesirable behavior patterns," The expelled student's infraction 
of the rules was her firaf offense, there was no evfdence.of any c(isruption of other student^, and 
evidence presented to the judge hearing the case indicated that the mtobehavlor v^las c|ue in large 
P^rt to difficulties thlit student had been having in her relationship with her parents/ In the New 
Hampihire case, the court stated that: v 

It is fundamenjtally unfair to keep a student out of school because of difficulties 
between the student and her parents, unless those difficulties manifest themselves in a 
real threat to school discipline.' 

In reaching a declaion which ordered the student reinstated in school, the court considered 
the haVm to the student in being excluded from school, the. effectiveness of the exclusion In 
deterring other student misconduct, and the failure of the school to prove that readmitting the 
girl to school would cause significant harm to the school's functioning. In addition, the analysis 
focused upon whether it is fair to punish students for behavior over which the students 
themselves have little, or no controj. 

In the second case/ a brother and sister wera both suspended from a Louisiana school 
^ under a school rule which allowed for the discipline of a student when the student's parent 
challenged the authority of school officials In an "offensive manner." The students were 
suspended indefinitely and then transferred to a new school for disciplinary reasons after their 
mother struck an assistant princjpal in the course of a discussion over his discipline of the 

* - ^ 111 . ; 

.. • 

ERIC ^ ' 1 \ il7 



children. A federal court of appeals found the 'discipline of the students an ""^^^f . . 

infringement of the right to substantive duo process of Inw Tho school r.ilo pimish^d tudents in 
tho ah'.nnco of .ny personnl quilt for th« infraction and in a situation where the school could not 
meet a substantial burden placed on it to justify Its Actions I he Louisiana case .rwolved a .un.la, 
anelysia There. the court asked whether there was a justifiable and reasonable need loi the 
school rule punishing students for the misconduct of their parents and whether there was a 
reasonable ind less onerous alternative means for fulfilling ttie need the rule was designed to 
serve. 

A second Series of questions relating to the fundamental fairness of an educational program 
or practice concerns the manner in which the program or practice was *";P'«'^«"*«^ J/^«'f^^ 
questions are sometimes treated as procedural due process issues. '^'^^ '"J^f ^^"^^^^^^^^^ 
process issues. The implementation of a new program or practice Pi-o'e" » 'f "t^jj'^ 

relate both to the sufficiency of advance notice of the change (procedural P^°f ""f 
tTe e)rtent to which the implementation scheme reasonably and ratiorially furthers a legitimate 
educational purpose (substantive due process). Because a procedurel due process analysis is 
most often applied to situations scrutinizing the mechanics of formal or informal procedures 
Involving hearings, the substantive due process rubric may be more helpful here. f 

■ One court was asked to apply a due process analysis to a situation in which « «t."f 
chall*ng6d the manner in which her graduate program changed t^^J^^S^'^^^^^J^^^J^^' ' 
degree. In that case.» the student argued that she was denied procedural and »"bstantive due - 
p^cl guarantees when a comprehensive examination was added as a O^j^^^^'^";;?"'^;"^*"* 
after she had commenced her graduate program The appellate court considenng the case 
Seclded in fSvor of the school after analyzing ihe factors involved. The court, nowever. .mpl.c.tly 
recognized a due process.right to timely notice of a change in graduation requirements. 

There IS Clearly a legitimate governmental interest in maintaining 

throuaK school discipline rules. The substantive ^e process considerations preaented in the two 

Sde'scTed above concerr> whether the schoql rules wer-^ fair means o achieving hat goal . 

and whether the rules were fairly applied. A similar type of due process analys s to that 
• SeScdb^ m tfiMwo discipline cases can be followed in examining school testing Projrajns The 

analysis. has already been applied to the statewide use of a minimum competency testing^ 
. program to deny high school diplomas. 

The Fundamental Fairness Flaw In One Minimum Competency Testing Program 

A forecast of the type of substantiv* due process analysis- thftt might be applied to 
oerfo^rnSincri^^^ education can be formulated by examining a recent cour^ 

St^Srconcrn^^^^^^ use of mlnlmurr, competency .testing to deny ^^9^^ ^^^^^^^^ 

t^tc^ZX^ Z^^ in the case of Debra P. v. Turlington^ in t^ summer of 1979 and was the 
first ludicial reaction to the legaUty of the minimum competency testing movement then 
"weep'no m^^^^^^ secondly schools. The lawsuit was.brought by a number o students who 
?a7JlhJ competency test and womd. as a result, be denied regular high school diplomas and • 
awarded Instead certificates of completion of high school. 

Florida's minimum competency test requirement was tbe result of a sjate law 
concerning miucational •ccountablllty. The law requited that high »chool 9^^^^ mars°udents 
at least the minimum skills necessary to function and survive in modern •^^^'t^VJ"? ^^"^^^^^^^^ 
demonstrate satisfactory performance In "functional literacy" to receive a^high school diplbma. 



112 



f 



LEGAL ISSUES 

' V ■ ■ ■ ■■ , , ■ ■ 

Pursuant to the* statute, a minimum competency examination of functicJnal literacy wa% 
administered to Florida's public hfigh school juniors and seniprs! The functional literacy test was 
first given to juniors in the fall of 1977; students who failed the to3t had two more chances to 
take it bofofo ttio yiaduatior) ipquifofuont was lu bo iiupuiiyd m Uio iipriny u(. 1979. Gubiitantial 
numbers of stuents and a'disproporttonato number of black students/ failed the test. 

The students who brought the lawsuit challenging the Florida testing program based their 
challenge on several different claims: that the program resulted in unlawful racial discrimination; 
that the program, through the remedial classes provided to students who failed the test, resulted ^ 

m resegregation of black students, and that the program denied due process of law. After a 
lengthy trial, the court issued a decision that placed a four-year moratorium on the uSe of the 
j^functtortal literacy test to deny high school diplomas. 

The substantive due process analysis is of primary interest as an analogy for studying 
performance testing. The Florida court.had little difficulty in finding a Jegitimate purpose served 
by the testing, i.e., . . the test could be utilized not only to gauge achievement, but also to 
identify deficiencies for the purpose of remediation/'^ The issue of the legitimacy of the means 
used to reach this goal was of greater difficulty. The issue, as the court saw ft, was . . whether 
the test utilized was a valid and reasonable measure for dividing students into classifications for 
the purpose of high school graduation."^ One might well whether the court was confused 
about what the goals and means involved were. The due process issue which had been 
presented to the court was whether the test instrument itself and the means used to implement 
the testing program were fair means to achieve the goals of placing students in remedial classes, 
lat>el test failers as "functional illiterates,'* and to determine the award of high school diplomas in 
lieu of certificates of completion. 

A major criteria for review of the testing program concerned whefTfer adequate notice of the 
change in the graduation requirement was provided to pavents. students^ and educators. Florida's 
statute was passed 4n the summer of lOTG^^tliik^ndardsiand objectives to be measured on the 
te^t were established in the spring and summer of 1977; the first functional literacy examination 
was administered in the fall of 1977. In effect, teachers in Florida's high school had only two 
months of class time to work with students on the new functional literacy skills measured on the 
test, skills which the court found had not previously been successfully taught to all of Floridi^ 
students. . = 

The court recoghized the need to'infprm students and educators of the importance of the 
test and the sanctions to be imposed as a result of the test, in addition to the subject matter to . 
be examined. The court recognized the educational implications of adequate notice: 

While all Instruction 1$ Important, there are obvious methods of motivating students 
and emphasizing certain skills. The principal problem with the instant program is that 
the instr'UctlQn in previous yeat^ took place in an.atmosphere without the diploma 
'Sanction ... It is critical that at^the time of instructiorr of a functional literacy skill, the 
student knows that the individual skill that'is bluing taught must be learned prior to his 
graduation from • Florida public high school. Instruction in the specific skills is critical, 
but likewise so is identification of whether the skills have been learned. Teaching and 
fearning are not always coterminous.^" \ 

Baled upon the expert testimony of several educators, the bourt concluded that four to six 
years should intervene between tKe time the objectives to be measured on^.the test are made 
public and the sanction resulting frerrr !he test is jmplemeri^ed. 



11? 



PULUN 



To assess the validity, reasonableness, and arbitrariness of the minimum competency test 
the court discussed the content and construct validity of the test and alluded to other technical 
tlaw'^ In the teit development flnd administration process. The court /loted errors of 
• considerable magnitude'- in test devel6pment and admin.strat.on and found adequate levels of 
content and coristruct validity The court found that, even if Floricta's test developersdid not 
meet appropriate professional standards of "state of the arf r^irements. constitutTp«l due 
process standards are not identical to professional test.and measurment standards. A tdst 
according to the Florida court, need only bear a rational relation to a valid state interest. The 
constitutional standards for test ^struments themselves are therefore lower, in certain cases.^ 
than professiohal standards. 

Fundamental Fairness In Performance Testing For VoqatiQnal Education-Some 
Fiecommendations 

4 

In the context of performance testing in vocational education, what due protess is due? 
Ctearly for those programs where successful test performance is i;equired to exit from the 
program obtain a -certificate or liqense. or for entry into an apprenticeship following the formal 
training 'students should be fully informed of the test requirement before entering the program. 
The nature of the sanction, e.g.. failure to complete a cogrse of study, to obtain a license or 
certificate, or failure to be apprenticed, is of iufficient maopitude to .require the early and 
complete-notice. What of tests of less magnitude? Given ffe r^ctance of the Judiciary to 
become involved in educational decision-makinO. pvUcularly in individual relationships betvwen 
instructors and students." a court may never intervene to determine the degree of due Process 
appropriate for such a situation. Court intervention, and the extent of such intervention, wiir" 
always fringe on the extent of the harm resulting from a program or practice. However, the basic 
tenets of due process would indicate that, if imposed, the due process requirements are less 
strict thaKthe notice can be less complete when tests have lets importance. An instructor In an 
occupational home economics course giving a test at the end of a teaching unit on metric 
conversion would, for example, be heldTto far less strict requirements than was the State of 
Florida in testing to xleny high school diplomas. 

The nature of judicial involvment to oae side, would it not be appropriate for educators to 
impose some due process, or fundamental fairness, requirements upon themselves in the 
classroom testing situation? Such requirements would undoubtedly foster better teaching and 
rtiore effective learning, Educators have recognized the importance of tareful objective setting 
for both teacher and learner.'* There should be little dispute that vocational students vw)uld 
benefit from knowing in advance what is to be expected of them as a result of their training, and 
that learning will Improve as goals are clearly identified and worked toward. Any constitutional 
due process standard of notice that would be applicable in this situation would not impose an 
additional requirement on educators but would Instead simply restate the perimeters of good 
educational practice. 

Assuming that a performance testing requirement has been fairly Imposed, some guidelines 
concerning the nature of the test itself can also be drawn from the Florida cowrt's reaction to the 
high school minimum competency test. What technical standards of the test and measurement 
profession have beeri recognized by the judldary as applicable to educatiorial testing? 

Ttie Florida court. In Its discussion of due process notice requirements was. In effect, 
recognlilpo the seldom recognized but Increasingly Important concepts of currlcular and 
' instnjctlonal validity. There was, in short, no match between the f uhctlonal literacy skills and 
objectives measured on the minimum competency test and the curriculum and Instruction 
offered the students who we.re required to pass the test to receive a high school diploma. 



I 



LXGAL ISSUES 



r 



To achieve fundamental fairness in performance testing for vocational educators, schools 
and instructors administering the tests should follow the following guidelines: 

» > 

• Sludonti> should bu infoirnod o( tho oxistooco and tho natuie oi the lesling lequiieiDBnl 
well in advance pf taking the test. 

• If the performance test will be required to exit or graduate from the trainln{|*program. 
the student should be Informed of the test before entering the program. 

\ 

^ • If the performance test will be required to complete a Course or unit of study 

successfully, the student should be informed of the test before beginning the cours>3 or 
unit. 

• Students should be informed of the »ubject-matter. skills, and objectives to be measured by 
the test. 

• The curriculum and Instruction offered the student should cover all subjects, skills, and 
objectives to be measured t^y the te9|. 

^ • 

• The test should only measure those areas actually covered by curriculum and instruction. 

• The test instruments of techniques used should meet professional standards for validity and 

• realiability. , < - " 



Performance Testing and The Potential For Unlawful Discrimination 

In addition to the fundamental fairness issues addressed by the Florida court considering 
minimum competency testing, the court also addressed issues of unlawful racial discrimination 
resulting from use of the functional literacy test. Similar issues are presented by performance 
testing in vocational education. 

Floridals functional literacy test, after ihe third administration just prior to graduation, had 
failure rates that clearly indicated that th« testing program impacted disproportionately on black 
students. The failure rate for blAck students was approximately ten times that among white 
students. The students challenging the test alleged that the test results for black students 
reflected the educational deprivations those students had suffered; the high school seniors who 
faced thd test-for-graduation requirement spent the crucjal first four years of their schooling in 
Inferior, racially segregated schools. In the years since physical integration of the schools, black 
students continued to suffer ongoing discrimination. Poor test performance for black students 
both reflected and perpetuated the effects of pas^ racial discrimination. 

The Judge, consldaring these arguments against the Florida test, determined that th^use of 
the functional literacy ieat to deny high school djplomas constituted unlawful racial 
discrimination. The test, the court concluded, should nqt.be used as a graduation requirement 
until all of the seniors compelled to meist the tesMor-graduation requirement had completed a 
full twelve years of physically desegregated iBchools. Tr\us, a four-yeflr moratorium on the use of 
the test as a gra<^uation requirement was ordered. ) 

The r«c« ditcrimlnatlon analysis in the Florida casd was based upon both a constitutional 
and a statutory theory. Under ths constitution, the tes\ihg,p.rogram denied equal protection of 
the laws to black students. Under federal statutes, the blr^ violated Title VI of the Civil ^ 



4> 



115 

l2i 



Hmhis Act of 19G4 Tho uv,(^ of constitutional nnd Titio VI theorion to «rrutini/e nn oducal.onal 

prL tl^ »>-n «mplovod rocontly. w.th porf.aps ovon nun. far r.acfnnn impl.cat.ons 
than tho rioridn caso. by a fnc1nrr#rourt m Cal.torn.a Ihe c.al..o,n., l^^^]^^!^:^; J'^,, 

I Q tests to place students In classes for the educable nienlally .etd.Jod (Ll^l^). Clas^oo to. L mh 
students were p^^^^^^^^^^ with a large perpenlage of blrfc»c students, a percentage cons.derab y 
h lahe^than the p^^^^^^^^ of blacks in the total school population. The court found .t unlawful o 
upon T O 3 to determine Ef^R placement ..hen there is no proof that those tests are ahd 
Z Sle fo use with black students «nd there is no proof that use of the fests or resul ant 
IproportVonato class placements furthorod tho purpose of prov.duK, tho t.est educat.o,^. 
opportunities for students. 

Challenqes under both constitutidnal and Title VI theories could also be bi-ouoht against 
performlce testing in vocational education. There^are also additional legal claims that can be 
brought in the vocational education context 

Programs that receive federal financial assistance are obligated to comply with an array of 
«»«t.^l?Indreaulations prohibit on the basis of race. sex. national origin. 

CO lo or hanZ ' Th^^^^^^^^ of^these prohibitions can be fairly summarized by reference to 
Sie March 21 1979 ■ Guidelines for Eliminating Discrimination and Denial of Services oA the 
Basis of Race Color Nattonal Origin. Sex. and Handicap" promulgated for vocational education 
hrthe Office of Civil Rights Department o* Health. Education, and Welfare. These vocational . 
eLcation guideline se?forlh nondiscrimination requirements concerning <^'«tr.but.orj of funds, 
access and%dr^is,ions to programs, counseling and prevocatlonal programs, instructional 
programs.- employment of faculty and staff, and proprietary schools 

The vocational education guidelines set forth several standards relevant to P^^^orn^ance ^ 

However if^a program can demonstrate that the criteria for admissio^n ^^^^ ^^^^^/^^^^.^^^^f. 

ilnl^fnn iho nroiBcted flroups. A performance lest measuring entry level skills used to select 
c^ndl°es ?or a voca«on7program cou be sub)ect to scrutiny undeLj|p guidelines I the 

E^iS^rsSriTS 

resiTsr:;;r.rraii^::o:::^.~^^^^^ 

not be used lor admissions purposes unless there was no other valid way ol assessing the 
entry level skills that did not have a disproportionate result. 

disproportionate minority failure rates. 



0 . 

122 



LEGAL ISSUES 



Nondiscriminatory Performance Testing 'Some Recommendations 

Legal standards regarding nondiscrirninatiun are hot designed to proliibit testing nor to 
circurtivent the primary use of tests, ^.cj , discrlminatinq between those who know or cen peWjr>rm 
from those who do not know or cannot perform The legal standards being discussed here do. 
however, prohibit distinctions between test takers when the distinctions are based upon 
protected status, such as race, rather than upon knowledge or skiii. 

V 

When the results of a test make it appear that distinctions were based upon race, national 
origin, color, or sex rather than upon true ability to perform the tasks being tested, then 
educators are asked to scrutinize their conduct to eliminate bias. This scrutiny has two phases: 
Does the test really measure something that has to be performed to succeed in the vocation for 
which the student is t>eing trained, and is this test the only valid source of measurement or is 
there an alternative that wdl achieve the same goal without harming minorities? 

In beginning a perfjormance testing program, vocattonal educators c€m take the following 
steps to minimize the potential for unlawful discrimination: 

# Test only at the basic leverat which competence must be demonstrated; if a program is 
designed to produce apprentice pli!imbers, the performanpe' test used for exit from the 
program should not measure skills^that only a master plumber would be expected to know? 

• If test results Indicate that a disproportionate number of minority students are failing the 
t^st, determine whether t.here is a different but equally valid test that would measure the 
same areas, without the disproportionate result. Also, determine whether the test results 
refect past deprivations and how.lhese can be remedied through compensatory educational 
programs. 

Performance tpsting and the Right to Privacy 

A final set of issues of legal concern relate to the use of performance test resultsr once they 
are obtained, What use is made of the test results within the vocational program, and how or 
where are performance test results disseminated. outside the training prograrri? For educational 
programs receiving federal financial assistance, there are clear standards concerning privacy and 
confidentiality under the Family Educational Rights and Privacy Act (FERPA).^" 

FERPA details protections for students concerning information, stich as performance test 
results, contained in school records. Test results may not be disclosed to someone who does not 
have a "legitimate ^cationiH interest" in^MIng the results without written consent from either 
Ihe parent of the stuaent or, for students over eighteen years old, the students themselves. 
F^rsona with 'legitimate educational interests'' and for whorn consent )s therefore unnecessary 
are prObabiy only persona directly involved in the student's training program. Pot^tial 
employers clearly should not recaive such information without written consent; potential 
superviaora for a work-study or apprenticeship experience probably should not receive the 
information without written consent. 

* 

The federal atudent recorda law alao requires that students at)d parents be provided 
interpretations of teat result infornuition should they request it. This provision clearly points to 
the naed for careful and unbiaaed record ((eeping, the use of vblid and defensible tests, and the 
need for trained personnel who can explain and Counsel about performance testing. 



Pill I IN 



In addition to the requirements of the federnl records statute concerning privacy and 
c on onl thoro are potential prnhlems nfa constitutional dimension concerning performance 
e s Tnd the use of test r.sults So.no educators may t,e mcluunl to 

to asse-is student attitude at>yut h task of vocalion. L.uU, MU'.mpl. unUuvfullv ^ninmy upon 
sUrpr rcy pa^^^^^ wf.en tf.ey scut.nuo areas that a,o actually unrola.e. ^^^'-^l^^^' 

?o successful performance in either the training program or the vocation. For exarpp le. a female 
sUent s altitud^^^ pregnancy and child-rearing has no bearing on her potential as a 
secretary. 

Privacy and CQnfidontmnty in Performance Tostinq- Sonw Hecommendatnvis 

To minimize potential infringement of students' privacy and the right to confidentiality, the 
followiag guidelines are appropriate: , 

. Test scores should not be disclosed to persons outside the school or to those not directly 
involved with the student's training without consent. 

. Test scores should not be divulged to potential employers without the written consent of the 
parent or. where the student is over eighteen, the student. 

■ . Interpretation of test results should be made available to students and parents 

. Tests should not include questions that unnecessarily infringe on students' privacy. 

Conclusion , ' 

The use of performance testing in vocational education can lead to desirable '^^o^J^^^^ 
in the delivery and outcome of training programs. Performance testing does present potential 
oqal probl^^^^^ of*ome magnitude. None of these problems is insoluble and. m fact a wise 
locational educator will work to alleviate legal entanglemehts and will, at the same time, have 
improved the educational program. 

To maximize the QducationaP benefits of a performance testing program and to minimize the 
impact of legal scrutiny of the program, vocational educators should structure the prograrrj so 
hat there is adequate phase-in time prior to implementation of the test. During the phase- n 
per od t^r^^^^^^^^ e^orts to insure the validity and reliabilt.y of the tesU^ 

rnstrument^ Puring the phase-in period, instructors should inform students of the s^biec^-matter. 
sknis rd objectives to be measured on the test and should insure that the areas covered on the 
terare intact being taught all students. Next, educators and test developers should insure that 
te 8 do nt un awfully discriminate against students on the basis of race. sex. nat.ona^ origin, or 
handicap. Finally, steps should be taken to protect the privacy of students participating in the 
testing program. 



124 



er|c 



I FGAl ISSUFS 



Ndtot 



'Sch0elhaaS9 v. Woodbury Central Community School District, 349 F. Supp. 988 (N.D. Iowa. 
1972). rev'd. 488 F. 2d 237 (5th Cir. 1973). cert. den. '94 S.Ct. 3173. ' , 

'Cook V. Edwards, 341 Supp. 307 (D.N.H. 1972). 

'341 F, Supp. 309. 

*St. Ann V PaJH^i, 495 F. 2d 423 (5th CIr. 1974). - 
^Matiavogsanan v. Hall. 529 F. 2d 448 (5th CIr. 1976). 
•474 F. Supp. 244 (M.D. Fla. 1979) 

black student had a ten times greater chance of failing to graduate than did a white student. 
•474 F. Supp 260. - 
•Id. 

'M74 F. Supp. 264. ^ ' . 

^*The clearest example of th^s judicial reluctance Is a recent U.S. Supreme Court case, Board of 
Curators of the Unlvaralty of Missouri v. Horowitz, 98 S. Ct. 948 (1978). In'that case the Supreme 
Court noted, In discussing the academic expuleibn of ^ medical student,that a student's 
academic status requires exi^il evaluation of cumulative information and a court should decline 
to overturn the judgment of educators. 

"Robert F. Mager. Pr^parln^ Instrudtlonal Objactlvas (Be'lmont, Calif.: Fearon Publishers, 1975). 
<»lar/y P. v. Rllaa, No. C-71-2270 RFP, N.D. Calif. October 10, 1979. 

•^Olicrlmjnttton on the baslt of race, coldr, or national origin it prohibited by Title VI of the Civil 
Rights Act of 1964, 42 U.S.C. |2000d. Disorlmlrtatlon on th« basis of sex Is prohibited by Title IX * 
of the Education Amendments of 1972. 20 U.S.C. ||1681 er sag. Discrimination on th« bssis of ' 
handicap It prohibited by |604 of thi Rehibllltatlon Act of 1973. 29 U.S.C. §794, and by P L 
94-142, Education for all Handicapped Chlldctn Act. 20 U.S.C. §§1401 at saq. Each relevant 
ttatuta, hat a tet of implementing regulatlont. written by the U.S. Department of Health, 
Education and Welfare, to further clarify the law. Finally, Title II of the Education Amendments- 
Act of 1976, m U.S.C. §§23ai 9t 89q. and "Quidelines for Eliminating Discrimination," 44 Fed. ' 
Reg. 17162, raftrred to heraafttr at "voc ed guld^ilnet," alto contain relevant nondiscrlmlniitron 
provltioht. 



PtH L IN 



-For tho pi.rpones of this discussion, « ••disproportionate result" or 'disproportionate effect of a 
tost is defined as a circumstance in wh.ch the total percentage, or proportion, of """;»''v 
students fa.l.ng the test .s yieale. than thai y.oup'. p.opo.tiuf, in th. total group of .tudm^ts 



ing 

taking 4he test. 



••20 U.S.C. §1232g. The Implementing regulations are found at 45 C.F R Part 99. 



erJc 



120 



126 



LEGAL ISSUES 



Comiiientt on th« Legal Ittuet - 
In Performanct Testing 

Vyilllam a Buss • ' ' 

Iowa Collegie of Law 
Iowa City, lown 



A hard letaon for law students to learn ia that the expertise of a lawyer has much more to do 
with predicting legal outcomes than memorizing a set of rules. Making such' predictions entails 
famillar/ty with the process of decision, appreclstion of the distinct institutional roles of court . 
and other decision-makers, ami awareness of the constant interplay of fact determination and 
„ value judgment. Making such predictions employs a process of reasoning that Is hardly^ 
scientific— in fact, a reasoning prcxess that, takes uncertslnty as a pervasive feature of a dynamic 
system. Part of the lesson to t>e learned Is that the predictions are characteristically tentative and 
often amount to little m6re than identification of alternative possibilities. 

The truth contained In this lesson cah be seen in the papers by Tractenbisrg and Pullln 
di^allng with legal implications of performance testing for vocational education. These papers do 
not tell us— as^they cannot— what legal results will follow from "performance testing;" they <> 
merely give tentative predictions—or, moi^ accurately, they provide a legal framework within 
whibh predictions might be made. They tell us a little about the way courts work. For example, 
they make it clear that courts attempt to assimilate "real world" problems Into legal categories, 
such as "due process of law" or the "equal protection of the laws" or a "right to privacy." 

Tractentwrg and Pullin tell us that the courts wiU both second guess educational Judgments 
and defer to educational expertise, and they try to suggest when courts will do more of one and 
when more of tt\e other. They tell ua that the courts will examine facts— such as those provldkl ' 
in t|^e testimony of educational experts or written in educational books or, perhaps, facts that are 
"known" by everyone. Including judges, such as fact» concerning the existence and disadvantage 
of iraeially segregated schools. They tell us also that the courts will make value judgments—such 
as those involved in, somefxow, "weighfng" the Interests of individuals who may be harmed by 
denial of a high sclio'ol diploma against the interests Qf the state in safeguarding the significance 
of a high school diploma. ^ v' . ^ ^ 

Finally. Tractenberg and Pullln tell us that to hazard a prediction concerning the success of 
various legal challenges to performance testing one embark on a process of reasoning that is 
truly labyrinthian. For-example, to predict the outcorpe of a discrimination challenge one must 
asifSS the Intertwining significance of (a) certain Supreme Court cases dealing with the equal 
protection clause of the Fourteenth Amendment; (b) certain Supreme Court cases dealing with 
statutory provisions, such as Title VII of the Civij Rights Act of. 1964,, as amended (previintlng. 
employment discrimination); (c) a body of literature (not court decisions) dealing with 
competency testing (not, as such, pwrfprmance testing in vocational education); (d) a single case 
by « court at the lowest level of the federal judicial system dealing with a particular competency 
tettlhg law (aoain, not a law dealing WItti vocational education) in the particular context of a 
state educational system not yet freed from the constitutional implications 6f having had, prior to 
1954, separate schools for black and white children. 



To bo sure ono c»n find somn ol this siimmnry only bn .fladlnfl belwoen tho linos ol th« 
P-por^ B,, h»V oTcourse. is because .heso papors havo.o.ho, pu,pos«s.an<. » 

IdTnrs ' pass' the test used: and let us assume a legal chniiongo based on d.sc.Tlfmanon 
agninst blacks 

If this leafll challenge is founded on the equal protection clause ol tho tourtoonth 
ame dint o? the United States Constitution, a Supreme Court ^^-^^ ~ 

u q 229 (1976) Doses a major obstacle. According to that case, the fact that a s gnlflcactiy 
h aher proportion of C thar> white applicants fail an employment test dqea not. without more. 
Show haX est was r^ discriminatory, proof of a discriminatory purpose Is required The 
rouTt haTalso s^^ Washington v. Dav,s and subsequently, that a challenger could prove the 
^e^u red discrir^^^ indirectly. J6 this end, statistics showing a raq.ally d.spropor- 

Za eXac wou d ^ but not conclusive factual information. The Court noted m this 

Tnt^t tTcasrof ^^^^^ Wo V Hopkins. 118 U.S. 356 (1866). in which the disproportion was m 
;rorderT99 D^^^^^^ That reference, plu. the fact that the disproportion In 

msTngton ITavrls SjTerceni to 13 percent, suggests that the 80 percer^t to 60 percent^ 
TbatnCof the illustration would provide relatively weak evidence of discrirr,lnatory purpose. 

In Detora P v Turlmgton. 474 F Supp 244 (M D Fla, 1979), discussed in both papers, a 
federal dttrici court discussed and ultimately distinguished Washmgton v^Dav.s in connec^on 
irit conilraU^ of a challenge to Flohda's competency test for high school graduation. 
The Tstric" court In Debra P. conceded that neither the disproportionate incidence of fa ure 
^atL Tn blackV one white in that case) nor the fact that the responsible education officials 
haTant^cipX this disproportion demonstrated a racially discriminatory purpose. Bu a 
Sfslinc ionW found in the fact that the black students who were challenging the test were 
aisled rhave suffered educational disadvantage attributable to school segregation. This 
critical fact provided the basis for a legal conclusion that the past d.scr,mmatory P^;PO«; 
apmeaate schools was perpetuated by the present competency testing program. Yet the locus of 
wZhinXn v^^^^^ in the District of Columbia, where the Jim Crow pracUce of separate 
but Cua facilties for blacks and whites was preval3nt in the public schools other aspects of 
Te cirs pubH^^ Since this background was not persuasive to the Supreme Court in decid ng 
whether a discriminatory racial purpose was shown (or that its absence should be discounted), it 
Ts noJ obviouS ha the background of de jure school segregation should have been R.rsuasive to 
he court n Debra p. Just as the Inferior education of segregated sehools might explain a lower 
rafe of issing Florida's competency test, the Supreme Court has explicitly observed ('n Qr/ggs 
^0^° e^ovve? Co.. 401 U.S. 424 (1971))'that such segregated education would disadvantage 
blackij in employment testing. 

The Waslfington v. Davis precedent is significant because It makes provirig the existence of a 
racial cla^ificatfon ^o difficult That would not-technically-defeat the equal P^o ect.on-based 
S m Nation Challenge to the hypothetical test which determines work studies elig.b.lity^The 
chal ^i?^^ any event argue, correctly, that the test for admission to the work st^^^ 

DroararTTls aovernment action that classifies-between those who pass and those who do 
no^-Td tha^only g^^^ classifications which allocate benefits (or burdens) reasonab y , 

are consistent with equal protection. But. In the absence of a racial classification (or some other. 



122 



128 



LEOAL ISSUES 

1ngr«dl«nt which p«rform8 the same function and assumed here not to be present), the 
reasonableness of the classification Is made virtually invulnerable t^ecause of the controlling 
stgntiicance given to l\ni institutional roles o( court and educHtioimi tester . 

With no proof of purposeful race discrirriinrttipn (or the equivnlent), th« courts "dotor" to tho 
educational tester because the courts believe that our political system gives educators the role of 
making the critical Judgment at>out what Is reasonable., That is, in the court's view, the educators 
are empowered to judge the reasonableness of the cliissiflcatlon rMultfng from the test, and th^ 
courts lack the competence as well as the authentiq power to second-guess that judgment. As 
ordinarily framed, the jjovernlng legal principle requires the challenger to "prove" that there is no 
rational basis relating ttie classlciation (iest passers vs. test failers) to the legitimate purpose of 
the test (e.g., to select thoae,who would profit, or profit most, from the work-study program) It is 
generally conceded that the challer\ger will be able to meet the test so Infrequently that the 
c()allejrfger's probability of success should be rated at 0. ' . 

Let Us assume now that the challenger founds the challenge not on the constitutKOn, but on 
some statutory and/or regulatpry provisions that prohibit racial discrimination In providing 
work-study opportunities of the kind In question. Under this assurnption, Washington v. Davis is 
not a direct bariler. Furthermore, tha Washington opinion reaffirmed Qriggs v. Duke Powar Co., 
which had held that, under Title VII of the Civil Rights Act of 1964, an amployment teat having a 
racially dliproportldnate Impact Is Illegal If not validated. To validate a test the user must show 
that the.teit Is an effective device for selecting the more qualified employees. In general, the 
different Qrigga/Washington results are explainable In terms of differences In Institutional roles 
and of the different Implications of constitutional and statutory decisions. In Qriggs, Congress 
had deliberately singled out employnwnt discrimination based on race as an erea of concern; In 
Washington, by contrast, the Court had no such policy decision to rely upon. Furthermore, the 
Qriggs result— prohibiting unvalldated tests because of their disproportionate racial impact— was 
confined to the employment focus of the statute; by contrast, a disproportionate impact decision 
in Washington would have hfd'aweeping Implications over a broad range, including such 
far-reaching areas covered by the equal protection clause as criminal law, taxation, and wetfara. 

' All of this suggests that the disproportionate impact ctiallenge in our iiiuatratlon might find 
smooth sailing if it is based on a statute or regulation. That conoluaion .iai far from inevitable, 
,howe\gir: Title Vl of the Civil Rights Act of 1904 provides the most obvious statutory sources of 
such a challenge, but. the rationale distinguishing Qriggs and Washington may not favor the 
challenger relying on Title VI. that statute does not represent a deliberiRe policy Judgment that 
racial dracrimlnation in vooationar education —or even in education generally— should be singled 
oat as an area of concern; Title VI applies to all programs receiving federal financial assistance. 

. As a consequence, any adoption of a disproportiopate impact principle for Title VI could 
>h«(r# broad applioiition over many areas. In fact, a majority of the Justices of the Supreme Court 
have indicted \in Unlvarslty of Cailltornia Rtgtn'ts v. Bakk; 438 U.S. 265 (1978), that the 
antl-diicrlmlnatlon principle in Title VI is Identical to the anti-discrimination of the equal 

•protection claute. 

■> ■ i. » 

Reliance on the regulations iMued under the Vocational Education Act would appear to face 
comparably dlfflcui^oblems. The act itself contains no anti-c^scriminatlon provisions (baaed on 
race) and plainly ,i(oaa not iiipr»aent a deliberate Congrestional policy, decision to prevent race 
dlacrlmlnation In vocational education. The regulations under the Vocational Education Act do 
expreaaiy prohibit r«oial diacriminatlon, but these regulatory provisions draw their authority, not 
from the Voc||tionaf Education Ac^, but from Title VI. Unless there is some reason to read the 



123 




I 



BUSS 



requlatiorts more broadly than their authorising legislation, it would seem that the onti- 
discr.mination pnnc.plo of Wfl.A./nyfon . lh,vus. Ihu oc,ual p,otm,tio.> datiso, and TItIo VI wot.ld 
al^o apuiy lo [hose JWW VI hnsnd rog.ilntlon. In fn^t .n th.> n»kk*^ <:as«. « niajorUv of tho Court 
oviduntly gave l.ttio s.gnif.cnnco to iMln Vi HfW uujulationr, which trndod to support r«r,Hllv 
conscious affl.'mative actloh to overcome the effects of past discrimination And. m other recent^ 
decisions the Supreme Court has n9^t been willing to follow anti-discrimlnat.on regulations that 
seehi to range beyond the scope of authorizing legislation See Southeastern Commiinity 
college v. Davis, 99 S.Ct. 2361 (1979) (handicapped); Genera/ tiectric Co v. Gilbert. 429 U.S. 
12b (1976) (employment discrmiination). 

Let us assume now, that the difficulties considered hero con bo ovorcomo and that tho 
racially disproportionate impact resulting from the work studies test of the illustration would 
require the test to be validated. At this juncture, a xjourt would be faced with a second basic 
issue and this Issue combines legal anjl nor^-legal elements. The court must set Itself up as 
something of an expert in testing. This might be accomplished in various ways-by he courts 
actually acquiring the expertise itself, by its use of a court-appointed expert, by relying upon the 
expert witnesses and/or arguments of the parties But the court must accomplish this somehow. 
That Is somehow the court must put itself in a position to understand what is meant by such 
things as construct valldlty. content validity, criterion-related validity. It must be able to decide 
which of these techniques is appropriate; and it must understand whether an appropriate 
technique has been correctly used. But it is not accurate to think of the court, sjmply. as 
assum^g the role of a testing expert In the end. a legal requirement of test validation poses a 
legal tes" There is no automatic Identity between something like ''«*=^«P»«^'« J-'^^.f^"^^ 
standards" and -acceptable I6gal standards^"; the law may require more or lesi or I 
example It may be argued that even though the test in question meets professlona standards 
mere Is an aTernatlve test (or.an alternative to the test) that, at the same time, would be effective 
In selecting students ready to profit from work study, but would have a «'9"'''c«"»'y 'f^^^^^;; 
tendency to exclude black students. The court must decide whether such a '^s^ res»^c»'^« . 
alternative test Is legally required, and the court must decide what should be accepted as a 
sufflcTeX^e^^^ selector and>as having a sufficiently reduced racial Impact. 

Although It Is accurate to characterize these decisions as ultimately "legal" decisions to be 
made by itp court. It would certainly be misleading to Imagine that the courts would ordinarily 
be free of the Influence of the "real experts'; In making those decisions. The extent of this 
influence on the court's decision defies prediction (and. perhaps, even defies accurate 
after-the-fact assessment). 

Both the narrow illustration of this comment and the more far-ranging discussion by 
Tractenberg and Pullin lead to several clear Implications for performance testing In vocational 
education First, legal challenges to these programs will be made, both because Pe^^elved 
rnjusSces are Invoked and because the legal machinery Is at hand. Secorid. ««^"^«»«'y P^«^ ^^"^ 
the outcome of the legal cases that will be brought is well beyond our collective wisdom at the 
DfBsent time Third, the certainty of lawsuits and the uncertainty of results will feed upon 
Cmse eT and create a distinct' mough unknowable, reality of its own. This new creation wil| ^>e 
shaped by two evolutionary pr6cedses-the one Identified by the courts episodic a "empts to 
understand the world of education and testing and to articulate legal rules responding to that 
understanding; the, other Identified by the on-gplng attempts of educational plariners to 
aSate (and to avoid) the "worst" and of litigators lo anticipate (and to exploit) the best of 
the emerging legal doctrine. 



ERIC 



124 

130 



1 



LEGAL ISSUES 



Part of our conventional wisdom, based on the Insights of de toequevllle. Is that political 
questions sooner or later b«com« judicial questions In the Unltod Stnto^'^Whtn^l^j^ldom noticed 
is that. In the process of assimilation, the underlying Issues are changed and distorted initiallv 
in their new legal setting and eventually in themselves 



. , 125 



CHAPTER FIVE 



IMPLEMENTATION ISSUES 



i 



Th0 9ucc998ful impl9m9nt»tlon of any program or product Is not an easy task. Cara must be 
takan that tha atapa In any implamantatlon plan ba carafulf Idaniltlad and analyzed. These 
concern^ are addreaaed In Chapter Five. 

FIrat, H. Brinton Mllward diacuaaaa performance testing as an organizational Innovation— not In 
the conventional aenaa of the term (i.e., meaaurlnff the performance of a student) but "rather of 
the performance of an entire training program of an Inatructor." Ha introduces such concepte as 
"Ideaa In. good currency" aaja ^necfuary precondition to the adoption of an^ innovation. The 
rem$lnder of the paper foouaea p^Jhe diffualoh and adoption of the Innovation within an 
organlntlon^ the role of 'ttreet'lem bureaucrats" In implamantatlon and a technique— "mapping 
baekwarda^^to arrive at an eatlmate of what will be needed to successfully Implement a program 
or practice. 



127 '■^ 

132 



In corttrant to the orgtmHt^ional thuory pttrapectlvB presBnt^d In Mllwar&s paper. Curtis R Finch 
addreasea the implementation from a more pragramatlc point of view. He clescnbes the K , 
implementation setting and identities a seues of con^idotations which .'ihould ha kopt n mmd 
currlcolar, teacher arKt ancillary p^aonnel. administration, student and community^ The pomts 
raised by both contributors are discussed by Janet E. Spirer.ln the Comments paper. 



128 



133 



rMPLEMENTATION 
ISSUES 



Performance T«tttng at an Organizatlomii Innovation 

H. Brinton Mllward . 
' . University of Kentucky 

Lexington, Kentucky 

f ■ N 

Performance tests are not Innovations In vocational education. Vocational educatlpn has 
long given a broad vHrlety of performance tests ip certify students for particular occupations and 
trades. Thus, these tests are not an "Inndyatlon" in the conventional sense of the term th^t refers 
to a product 'or practice new to the adopting unit. ' ' 

. This paper will be concerned with the acttral or potential use of the results of performance 
tests, not as measures of the petformance ot a student, but rather of the performance of an 
entire training program or of an instructor. From this perspective performance tests are an 
innovation that states, school districts or the federiH government could use to evaluate the' 
performance of Instructors or programs, thus. In theTcontext of this paper, a performance test 
would not be an Innovation to a welding teacher: It w\uld be an Innovation to th^ staff of the 
state office of vocational education who would use the sggregated results of students' scores to 
evaluate how successful a given program was In actually training people for specific occupations 
and trades. 

The Impetus for using performance testScSS Instruments of vocational education program 
evaluation comes from the implementation of the 1976 Amendments to the Vocational Education 
Act of 1968.* The act stipulates that both the Bureau of Occupational and Adult Education and 
the states shall audit and review vocational programs to make sure they are the best possible 
programs of vocational education. The role given to the states is very explicit:" . . . each State 
shall evaluate, by using data collected, wherever possible by 'statistically vi^iid sampling 
techniques, each such program within the State which proporls to Impart entry level job skilli. . ."^ 

In other words^ voci^tfonat' education mu%t become result oriented. Inoreasingly. ^the 
emphailt will be on what the students can do In the occupations they have been trained for 
rather than an evaluation that is oriented toward the process by which students have been 
trained.^ , 

A response to tlie legislation and the general concern for government accountability has 
t>een improved monitoring of the effectiveness of training. Performance testing can be used as 
one mechanism aslMSsIng outcomes of the training process. In the past, performance tests have 
seldom been used in this fashion, and most evaluation efforts In vocational education have 
beep". . . too casual, informal and fragmented and have only rarely served the cause of program 
improvement. . "* 

What is occurring in vocational education is no different'from what is occurring In a variety 
of other programs, The federal gpvernmerit is attempting to Increase the analytic capability of the 
states ", . . to strengthen state leadership In eduoatlon, to put more of the monitoring 

129 " . 



MILWARD 



resDonftibility in the hands of state education agencies,"^ This has resulted ffom the federal 
Zerno^e^^^^^^^^ inability to monitor or control effectively the behavior of thousands P/09^«'^« ^ 
scat3 across the country In .ntorgovorruuental .olal.on^^^tho (odoral govornmont hns bnrom. 
To Iv do. oMurids arKi a writor of giMdohno. nnd requ\^U.X Th« statos rolo has bocomo that 
of fodora pCa^^^ and tho local p.ogran.s arc thS^iehvory agents I h.s -p.ams why 

the rorof the s?ites in evaluation of performance of local pr^rams has become such a sai.ent 
issue. 

Ideas in Good Cunency 

There is a direct connect.on between adoption of an innovation and ideas in good currency 
Ideas in good currency are a necessary precondition for the adpption of public policy^ 
innovations The space race of the 1960s as well as the law and order movement of the late 
^^Hnd early 1^^^^^ are examples of ideas of good cur-^ncy. "Among their ch-acteri^^ 
features are these: they change over time; they obey a law df limited numbers and they lag 
hehirid chanaina events « The "failure of the schools" idea, which has led to accountability 
rasuresZ minTmum competency testing, is the idea in good currency behind performance 
testing as an instrument of evaluation. 

New Ideas In good currency usually emerge from a disruptive event in a series of events 
These pe cefved crises set up a demand in society for new ideas to solve these problems. It is at 
mis pomt Ct Wea^^^^^ are beyond the mainstream of the public agenda begin to surface. 
TMs bccu^ th o^^^^^ a process of diffusion that depends upon interpersonal networks and upon 
[he c— Sion meL which ir. turn, shape the idea to their needs, m the ^'^J - "'^^ of 
competency testing, traditionalists used 'the failure of the schools" 'de^o try^to ab^ niany of 
the non-traditional courses and prpgrams which were developed in the 1960s and 70s. 

Ideas must gain entry to the limited set of channels through which formal policy agetidas are 
set As Schon wrotd. ". . they require, in the approval of administrators, commissions, notable 
Dersonages le^is^ors and the like, a^ kind of benediction.'- This power is used sparingly and the 
Src^on'to do^so comes usually from a shared calculation of the idea's rela on to person^^^ and 
political interests and of the support the ideas have already gathered^ As Feller, ^enze and 
Engel found in the case of federal legislation.- the adoption of a new ^JJ^^^'^V^y f^f^*® 
directly traceable to the passage of a new highway or air quality act. "Although federal 
fj?sli«^n ^dom^^^^^^^ the adaption of a specific technology, the 'choice' rrjay be narrowly 
d7fin^."« Thus, ideas In good currency can effect diffusion patterns through an intergovernmen- 
tal network, 

ImplemBntatlon aa an lnt0rorganizatlonal Process 

Innovations based on ideas In good currency must diffuse and be adopted as well as*,^ 
implemented, into practice through an interorganizational network. There are ^o »««t"7» ^^J^® 
vocational education network tha^ distinguish It. First, since education is a . 
• lederal function in the United States, there is no national configuration to he network in terms of 
equivalent organiz^itiivls. actors or practices. Second, the delivery system for .vocational 
education and training is difficult to^istinguish-from the general network of^^^ucation. 

' For the most part, the vocational education delivery system is the same system used to 
educate all of the secondary, postsecondary. and adult students. School pnnc.pa s. superinjend- 
ents presidents, directors, and boartJs-those who make the decisions for education in 
Oeneral-also mfike me majority of the decisions for vocational education. 

'■ ' r ■ ' 130 ^ 



IMPLEMENTATION 
ISSUES 



Thert are a large number of organizations that shape policy and delivery of services in 
vocatiohal education. These Include, for example, the Department of Education. State Boards of 

Vocationai Education, Statu Advisory Councils for Vocatiorial Lducutior), Local Boards of 
Fducatton. the Department of I ahor. Stnte Employment Sprnrlty Agnnrio??. nnd TFTA primo 
sponsors In addition, these organi/ations exist in thousands of cornrnunitios. fifty statos and'tlin 
territories and at the national level Of government. Instead of a neatly arranged hierarchy with 
clear lines of authority, what we have is a loosely coupled functional system with considerably 
more power at tl|(il middle and bottom than at the top. In addition, all three levels of 
organization— federal, state and local— possess certain scarce resources valued by the others 
Each level also has a certain amount of constitutional and behavioral indopondonco 

While the organizations providing vocational education training are loosely coupled with 
those providing coordination and guidance, the network of actors and organizations consist of a 
tightly coupled policy network— albeit one which lacks elaborated, hierarcjhical authority 
relationships* It Is a network which'ls boundary. maintaining and which has persisted for over fifty 
years as a separate entity frofh the larger general education system. This occurred because of 
the differences in orientation, as well as in status, between the two groups. Vocational education 
is t>est described as a functiofi^l system that consists of: 

1. The set of persons who lack, but want or need occupational skills pr training. 

2. The set of agencies, groups, and institutions that serve and train them. 

. 3. The research, evaluitton. and training activities that affect the provision of educational 
training. 

4. The laws, policies, and programs under which vocational education is provided. 

To call this a 'system' is not to imply that ittias well-defined, consensual goals and 
coordinated programs for reHching them. The institutions included in (this] system tend, in fact, 
to |||«have in a fragmented and disorganized way."'° ^ven though disorganized, this is the system 
with which persons seeking vocational trair\ing must de,al. 

Karl Weick calls this a "loosely coupled system where the individual organizations in the 
system are more like holding companies than goal directed entities." He suggests that this may, 
be due to the diffuse task vocational education performs and the uncertainty of the technology 
used in the process of educating students.** 

Performance Tests as an Evaluation Method 

We are assumming IJere that performance teats are not an innovati6n to those who will 
administer them. Thus, a secobd assumption may be made; i.e., if the innovation is to be adopted 
and effectively used, then one must focus, not on the process of innovation diffusion, but rather 
on the Implementation of the results of performance tests to local administrators, state vocational 
education officials and federal adminstrators of the Bureau- of Occupational and Adult Education. 
With this us |he focus, several corollaries must be spelled out. 

First, any new system of evaluating programs or individuals will increase the programs' and 
IndlviduAlt' unccrrtainty in regard to their perforrnance. Uncertainty is a^key concepVin both 
organization theory as well as economics. A person or organization will always try to reduce the 



131 



< 11 



136 



MILWARD 



amount of uncertainty they must deal with. People In public organization., like vocational 
•dkicat on proorami. will prefer to be evaluated by Inatrumente that they both understand and^ 
Jftlnce pXma^ce testing depends on measurablo outcomes Since it is " '^^^^ 
thnlis ngnrogntod rather than superiors evaluating whether or not an instructor followed the 
correctXce^ of teaching and test administration, the teacher may feel that the outco.ne 
me«ur* of eJal^ Is unfair since a vrlety of/things may affect the students' score. Too many 
Zen^ may El scheduled to take the tests at/ne time, the quality of equipment m^y vary from 
prCram to program: teachers may feel that tfy have more than their share of und«rmotlv«^^^^ 
students. Thus even If promotion, pay. and trfsfers are not tierf to an eva uatlon system heavily 
relying on performance testing. It would be a major source of uncertainty for those being 
evaluated. ' _ 

This suaoests Why an Innovation model is not appropriate for performance le^ng as an^ 
evaluation Instrument. Many of the Innovation models assume that all '""^vfons Oo thro^^ " 
sequence of stages approximating the research and development process ^here technology 
it^ln^tes the res^^^^^^ does not apply to educational Innovations, like performance testing. 
«T^e?afuat"S of the innovation Is very difficult to show 

and in addition, the Innovation clearly threatens both teachers and admin stra ors whose 
prog ams will be evaluated with the Information they provide. With educat onal Innovations 
where the technology Is "soft." Implementation, not the superiority of the . echnology will 
lm7nate outcome?- Education Is not unique In being dominated by the Impl^enjatlon 
process. - ' , 

Simply because teachers and admlnlstrators adopt an Innovation doea not mean that the . 
adopted practice will be the same as the original Innovation. The actual 0"tcome^ o^^^^^^^^ 
adoption of performance testing will greatly depend on how teachers and adrnlnlstrators 
r,SpremenntTa federal system, there are no command and control mechanisms for forcing 
XlTance directives from either the state or federal level. 'nt«^^«P*"l*"^« ""^^^^^^^^^ 
Sably shape Intergovernmental relations. In this case, as In so many others dM'«nfl^^«^» 
mDSta3the "Street-level bureaucrats"-the teachers-will lar'oely determine whether or 
no??hTevalua^^^^^^^^^ produces meaningful information upon which to bas^e program 

> * 



choices. 



Str09t-LBV9l Bureaucrats and Implementation 

The concept of street-level bureaucrats Is very Important In understanding the Introduction 
of an Innovation Into continuing practice. S^eet-level Jjll^'f ^ 
officers welfare workers, public health officers, and many others. A|l ofthese ©"'o ®'* work with 
?he DuWirand make dec siohs on the basis of individual initiative as well as established rputlne. 
The? llract d^^^^^ In fact, they are most Peoplo'sonly direct corrti^ct wUh^ 

goJJrhmenTsince theV exercise considerable discretloh In their||s. they effectively determine 
how policy Is delivered to citizens. ^ IT - , 

in other words: "To accomplish these required taJks. street-le\el bureaucrats ^^^f^^^^J'^^'' 
to accornmodate the demands upon them and confront the real tylf •'source llmltatlo^ They 
wSly do this by routlnlzing procedures, modifying goals, rationing w^'o«'. " J 
Driorlt et and limiting or controKlna clientele. In other words, they develop practices that permit 
fhemlrTito^^^ t^^V *° ITheirl work^. . . Is Inherently 

X^c JSonL^ S It I. difficult to establish or Impose valid work-pertormance m^^^^^^ 
and the consumers of serylces are relatively insignificant as a reference group. Thus street-level 
bMreiiUcrats are constrained, but not directed. In their work.' 



132 



1^7 



IMPLEMENTATION 

ISSUES 



The^e accomodations and coping mechanisms, that they are free to develop, form patterns 
of behavior which become the governmental program that is/'delivered" to the public In a 
significant Sense, then, street-level bureaucrats are tHe policy-makerf? in their respectivo work 
arenas / 

As Weatherly and Lipsky and Richard Elmore point out/'' this turr^s the study of both 
Innovation diffusion and the process of implementation on its head The lowest level of the 
Implementation network determines policy while the upper-and mid levels are only able to 
circumscribe the behavior of lower officials within certain broad limits. This occurs because in a 
loosely coupled, interorganizational and intergovernmental network, goal homogeneity in the 
absence of hierarchical authority cannot be assumed "Interorganizational problems arise largely 
from the difficulty of coordinating the activities of several different units, each of which has its 
own goals and established routines.""* 



There appears to be an inveVse rcrtationship between the number of required transactions 
between organizations to implement a new program or practice and the likelihood of the 
Implementation being successful. **Eyen when the probability of a favorable result is high at each 
step, the cumulative product of a large number of transactions is an extraordinarily low 
probability of success."^' A recent study lays out rn elaborate detail the multitude of devices and 
ploys that experienced administrators can use to subvert, deflect or delay the effect of 
programmatic innovations ihey do not like.'® 

*• 

Given the fact of the inability of state and federal officials to control the behavior of local 
teachers effectively, what can be done tq increase the probability that an evaluation system at 
least partly based on performance tests will not be subverted? One technique for arriving at an 
estimate of what will be needed to implement a new program or practice successfully is called 
"mapping backwards." It proceeds from our assumption that power over the delivery of 
vocational education training effectively lies flhe hands of the street-level bureaucrats— the 
teachers-^rather than in the hands of adminisfrative officials at higher levels of government. "In 
the bewilderfng variety of local institutions . . . one factor remains constant: The point at which 
public policy meejs the private preferences and choices of young people is in individual contacts 
between teachers or program operators and young people. This is the street-level contact that 
determines whether policy affects the behavior of individual young people 

Mapping backwards focuses not on the goals of the administrators at the top, who wish to 
gse performance tests to determine which programs are successful and which ones kre not; It 
begins with looking at the behavior of those who will be implementing the performance testing 
system then proceeds to ask the question "what do I want the teacher or local administrator to ^ 
do?" Once that question is answered, one traces back though every step in the implementation 
process and at each step determines what needs to be done to increase the probability that a 
tef<cher will implement the performance testing system in the prescribed manner 

When the vocational education network is viewed from the bottom up. it becomes cl^ that 
whatever policy we wish to implement ultimately will depend, not on a centralized commiind and 
control system, but on changing the behavior of local teachers and program operators who 
» actually deliver services to trainees. 

V, ' , ^ ' ^ ■- 

The true policy problem that must be faced is not to make. teachers behave consistently with 
respect to a new evaluation System, but to increase the probability that the teachers skill, 
judgment, and knowledge will affect the ability of trainees to find meaningful and productive 
work. 



/133 



. ERLC 




CQocluslons 



The preceeding sections described tho network through whicl> a porfotinanco loSt bnr^ort 
evaluation aysloin would bo implomnntod Tho pnpor hm nlno id»,ntif.«d wh«r« the ab. itv to 
Shape policy lies and whose behavior .TU.st be cha.^Jed if a now ovaluntion syslom jo ^e 
successfully Implemented. This section will focus on what can be learned f °/^V^^ n?orrss 
and Innovation from this discussion. The Implication of the paper thus far Is that . he process 
of framing questions from the top begins with an understanding of what s important at the 
bottom."" 

With the implementation of any innovation, there are three reasons to cooperate with those 
promoting the innovation. The first reason is self-interest. People and organizations join together 
because participants perceive the Innovation to be In their best interest. Given the variety of 
different people and organizations In vocational education, it is unlikely that one Irinovation will 
be perceived In the Interest of all or even a majority of the organizations and people In the 
network. Therefore, this is not a sufficleYit base on which to structure cooperation. £> 

A second reason for cooperating is that higher level authorities mandate cooperation 
Innovations that are linked to the governance system of an organization will obviously command 
more attention than those that are not. But a mandated evaluation system that has to be 
Implemented across governmental boundaries and where the institutions involved are loosely 
Joined will not have the same force as It would If M occul-red withjn one organization. 

A third reason for cooperation is exchange. Here, people cooperate because they receive 
Wething they value in exchange for their cooperation. In a loosely joined netv>/ork this will 
facilitate cooperation, as it is unlfkely that any one organization will have all of the resources 
needed to accbmpllsh their tasks. This creates a positive incentive for mutual exchange of 
needed resources. ^ . 

In reality all three of the reasons foKr inducements to. cooperate will be effective. in 
certain situations. Also, the three are Ideal types, and most interorganlzational »^«"8act Ions have 
elements of more than one of the three: often one ^ees an organization adopt a carrot-and-stick 
approach to Inducing cooperation. 

The purpose of defining the three reasons for cooperation Is that administrators at 
and state levels, when they are dealing with local officials, often assume that the local officials 
Interests and goals, are or should be. the same as their own. They also may operate as If an 
authority relationship existed between them and local officials. As this paper has pointed out. 
these are Incorrect assumptions and may contribute to the failure of an Innovation such as an 
evaluation tystem to be Implemented or. If Implemented, to provide meaningful data on which to 
judge program performance. • 

If we wish to Increase the probar)lllty of the Implementation of a performance testing aystam 
as an evalu.tlon Instrument, we need to map backwards In our analysis from the teacher who will 
actually give the tests to the local administrator of the prbgram. to the, state vocational education 
officiala In charge of evaluatlori. to the federal administrator In the Bureau of Occupational and 
Adult Education. This Is the reverse of the process that most analysts propose. Systems analys s. 
DOllcv analysis, and other rational techniques advocate starting with the goals o federal of Iclals 
and mapping toward to the point of service delivery. In the absence of hierarchical control and 
common goals.- this will not usually be effective. 



134 



IMPLEMENTATION 
ISSUES 



if w« map backward though, we find teachers who feel a great deal of uncertainty over a new 
method of evaluation that they cannot completely control. There are admlnl8trator8 of local 
programs and school principals who will wondor whoro tho rosouicos will coitio fioin to colloct 
and tabulate the data generated by the system These ndministrntors will hIso know thnt toHchors 

• will put pressure on them to upgrade the equipment used for portormnnco tostlng so it will bo 
appropriate for the newly developed tests. 

"Any kind of broad' mandate that occupational competence be demonstrated by 
vocational education students could be viewed as some kind of disaster The reason . 
. . is quite simple: The mandates always seem to require more than can be produced 
under the constraints which exist."" 

All of these pressurea may dispose a local administrator to oppose or subveYt the new 
•valuation system. With atrvice delivery and people-processing programs you simply do not get 

• implementation without resources. It is a necessary but not sufficient condition." The sufficient 
condition Is support for the innovation by the local administrator. In two different review articles 
on the implementation of innovations, one that specifically focused on the implementation ot 
evaluation findings, the support of the local administrator was found to be critical in successful 
implementation." 

Qi\gjn that vocational education is a bottom-heavy system, what suggestions can be offered 
to improve the chances of successful implementation? 

1. Map the delivery network backwards from the activities of the teaijhers to the source of the ' 
innovation. 

2. Often local administrators do not comply with a mandate because it is not accompanied by 
the resources to implement it. Try to distinguish between an unwillingness to comply and a 
lack of capacity to comply. 

3. Only attemi^t to change those activities for which it is possible to specify a clear standard of 
performance." 

4. Attempt to Intervene as closely as possible to the point of.sorvice delivery so that the 
Innovation is not distorted in the levele between point of service delivery and the source of 
the Innovation. There must be careful preparation of local personnel so that they are 
prepared to implement the new system. Their advice is also needed in shaping the 
innovation. 

5. Rather than simply monitoring compliance, state vocational education agencies should 
emphasize services to local programs.'" 

.3. While state and federal agencies cannot control the implementation process, they can 
^glifferentially reward those local programs making the greatest effort to implement the 
^^^novatlon. The creation and manipulation of a program's incentive structure may be one of 
ipne more effective ways to increase the probability of successful implementation. 

« 

The central point admnistratora that should bear in mind is that while some policies, like 
affirmative action, are regulatory in intent, vocational education exists primarily to deliver 
aervlces. Here compliance, while Important, Is secondary to Improving the ability of achdols and 
Institutes to deliver services, the quality of which depends, to a great extent, on delegated 
tcontrol." 

• ^ . * 

135 



MIL WARD 



'Public Law 94-482. October 12. 1976 90 State 2187. ^ 
'Ibid . Section llK6)(1)(B). ^ 

'William W Stevenson. "The Educational Amendments of 1976 and Their Implications for 
.V^aSS^nal Education." ln(prmatlon Series No. 122. (Colurribus: Center for Vocational Education. 
The Ohio State University. 1977). p. 5. % 

^Henry Boybw. Philosophical, Practical, and Technical Issues Pertalnlna td Performance Testing 
In^^^a^lZ Education, unpublished manuscript. (Columbus: National Center lor Re«i?rch In 
Vocational Education. The Ohio State University. 1978). p. 7. \ 

osamuel HalDerIn* "Emerging Educational Issues In the Federal Clty."\cG«tlontl Paper No.^42. 
SLti^!! N^i^^^^^^^ ReseTCh In Vocational Education. ThJVohlo State University. 

1978). p. 7. . 

•Donald A. Schon. Beyond the Sfab/o Sfafe (New Yprk: Norton. 1971). pp. 123-124. 
nwd.. p. 140. ^_ 

•I Feller D C Menzel. and A.J. Engel. Diffusion of TechnolSly In State 'mslon-Orlented 
A^lTes iS^ZT^or ihe Study of Science Policy Instttvte fbr Re«iarch on Human Resources: 
Pennsylvania State University. 1974). p. 31 . 

The Niitronal Center for Research In Vocational Education. "The Status of Vocational Education: 
: 2choo^^^tear 197^^^^^ and Development Series No. 162 (Columbus: Ohio State 

Vunlvertlty. 1978), p. 64.. 

'oSchon, Beyond the Stable State, p. 4^. 

" Karl E. Welck.. "Educational Organizations as Loosely Coupled Systems," Administrative 
Sc/ence Quarter// 32. no. 2 (March. 1976): p. 1-19. 

"For exacnple. tee Ronald Q. Havelock. Planning for Innovation (Center for "••••'^ch on 
UtHlzatlon bf Scientific Knowledge, Institute for Social Research. University of Michigan, Ann 
Arbor; Michigan. 1973). 

«Paul Barman, "The Study of Macro and Micro Implementation." Pub//c Policy 26. no. 2 (Spring. 
1978): p. 161' . " 

i-Rlchtrd Wettherly and Mlchtel Lipsky. "Streit-Level Bureiucrats and Institutional Innovation." 
Harvard Eduoatlonaf Revkw 47. no. 2 (May. 1977): p.- 172. 



136 



IMPLEMENTATION 
ISSUES 



"»lbld., p. 173 and Richard F. Elmore. "Mapping Backward. " Paper presented at the annual 
mMting of the American Political vScience As«c>hlntlon. September. 1979. Wnshlnciton, D C 

■"Mobeit S. Montjoy «n(J Lauieuce J. O loole. Jr. Joward a Iheofy of Policy lmpl«mHntatton," 
Public Administifttion Revievy, 39. no. 5 (September/Octobef , 19/9), p. 2/3. 

"Eltnore. "Mapping." p. 10. He is reporting the^ results of a study of the implementation process 
by Jeffrey Pressman and Aaron Wildavsky^ Implementation, Berkeley: University of California 
Press. 1973). pp. 87-124. 

'•Eugene Bardach. The Implementation Game (Cambridge, Mass MIT Press, 1977) • ^ 
'•Elmore. "Mapping." pp. 29-20. 
"Ibid . P..21. 

''J. Stanley Ahmann. "Implications of the Minimum Competency Testing Movement for 
Performance Testing in Vocational Education." unpublished manuscript (Columbus. Ohio: 
National Center for Research in Vocational Education. 1979). p. 20. 

"Weatherly. "Street-Level." p. 182. 

^'William L. Hull, "Implementing Evaluation Findings," Manuscript (Columbus: Center for 
Vocational and Technical Education. Ohio State University. 1970). p. 3 and Donald C. Orlich, 
"Federal Education Policy." Educational Researcher 8, no. 7 (July/August. 1979); p. 6. 

'^Richard F. Elmore. "Complexity and Control." Institute of' Governmental Research. Public Policy 
Paper No. 11 (Seattle: University of Washington. 1979). p. 42. 

"Weatherly. "Street-Level." p. 195. 

'•Elmore. "Complexity." p. 9. 



r 



137 



IMPLEMENTATION 
ISSUES 



^ Consld«rattont In tht lmpl«m«nt«tlon 

of P«rfomianct Ttitlng 

« 

. Curtis R. Finch 
Virginia Polytechnic Institute and State University 
Biacksburg, Virginia 

\ . ■ . , 

\n our •oclety. frequent changie Is Inevitable. Employment opportunities shift, new 
occupations are eitabllthed, and employers revise their expectations of workers. Change hae* 
also become quite prevalent in education. Rich,' for example, notes a variety of educational 
movements and innovations that have been proposed over fhe past two decades Among these 
are the oqen classroom concept, career education, and maln^treamlnfl.' 

In recent years, the notion of educational change has fallen Into disrepute. Thia state of 
affairs ia at fUst partialy due to teachers' percept! o>is. of benefits derived from It. During the 
1950a and 19608, teachers were strongly encouraged to accept change and cooperate with 
others to ensure that it occurred. They were often told that a change would result in certain 
benefits such as greater efficiency or Increased student learning. Thia, of course, did not occur 
in some cases, and teachers rapidly tjeibame disillusioned with change for the sake of change. 

While a simple definjtion of change may be any alteration In the statue quo, this does not , 
take the basic conceVna 6f educators into consideration. A more expansive definition must be 
used for educational change. It may thus.be thought of as any significant alteration in the status 
quo ^t \9 intended to benefit the people involved.' Such a definition reflects the need to 
implement only those changes that have the greatest pot^tlal for poaitive payoff. 

This paper examines one such change, giving conaideration to ita implt mentation In 
vocational education tettingt. Performance leating appears to have great potential for Improving 
the educational proceas and the results of thatproceat. However, Ita potential may never be 
realized If educators and others are not attentive to factors ttiat hinder implementation in the 
schooia. 

Aa the other papera have noted, performance testing js a rather complex phenomenon. And 
once phlloaophioal, legal, and technical iaaueQ surrounding performance testing have been at 
least partially riaolvad. there ia atlll the need to deal, with a host of implementation 
considerations, they Include the basic implementation aetting aa well as the curriculum, 
teachers, support perionnel. admlniatratlon, students, and the community. Each of theae areas 
will be examlnad in order to highlight some of the k6y issues associated with implementing 
performance teating in vocational education. 



Thw Implimwtttfon Setting ; 

WHeachange is being considered, ^ may be most beneficial first to examine the setting in 
whiGh.change will take place. Hull, Kester and Martin^ note the three elements th^t can provide 



FINCH 



th« fKwessary stImulaWon for chang^i to occur. These Include the change advocate, the targeted 
contum«r.-and the Innovation. In the application of this basic notion to performance testing 
consideration may also be given lo sevoral otlior koy olomonta, nnmoly thr rnrrlciilum nnd the 
community 

The change advocate serves as an Initiator of the change process. Logically, if change .s to 
occurs some ptrwn. group, or organliatloh must provide Initial support. Vocational education 
admlnlttraton and supervltors tend to be most readily classed as change advocates; howeverjt 
Is best to go beyond these Individuals and consider others such as vocational teachers, ancillary 
personnel, students, parents, employers, and oven professional organlyatlons 

A second key element Is the targeted consumer, Consumers are those who will actually use 
the Innovation, not merely pass It on to others. They may. likewise, be persons, groups, or 
organizations. While the change advocate Is hopeful that All consumers are eager to accept 
Chang*, this la typically not the case. Some consumers are more adoption prone than others and 
are thus more receptive to change. 

The Innovation, which constitutes a third element, may have almost any form, dimension, or 
substance. In this Instance, performance testing Is reflective of a system ^^f^.'^^y^JJ " 
b»ls for instructional Improvement, evaluation, and accountability If the Hu -"dWeMs^ scheme 
for crasslfying vocational education Inndvatlons were applied. It might be d fflcul to detemilne 
Whether performance testing would be individual-behavioral, organizational-legislative, or 
scientific-technological. Classification may. in fact, be a function of the Intended usft and 
associated technology of performance testing. 

Of equal rjplevance to change Is the vocational education curriculum. A"/ . 
change must Woven \^ the curriculum in such a manner that It Is accepteJd arid utHlzed. In 
terms of performance testing, thought should be given to a variety of areas "c'^flnQ the 
allanment of test*, objectives, and the employment setting; varying technical content, and 
Z^g instructional settings. Each of these may affect the ways that performance testing is 
intimately implemented iiS the schools. 

The community is yet another element to be considered when change is taking place. 
Included In the community setting is a host of persons who must be dealt with various points in 
time These Include citizens, individual taxpayers, school boprd members, owners, managers. 
suf^rvlsoJ!. per^^^^^^ directors, and advisory committee members.* In this arena, concern tends 
to be expressed about the quantity and quality of education as well as how much vocational 
education will assist business and industr/'to g^ow and prosper. Com mun ty concern abou 
chanoe is extremely Important since endorsement or lack thereof can spell success or failure. 
SvhHe i^^^^^^^^^^^^^ Q^ups in the community do not have day-to-day contact with vocational 

education, many are in a position to inflyence resource allocation and support for funding. 

A 

!• 

•» I 

Currfcular ConsldBrations 

The vocational education curriculum can be viewed as more than courses and content. 
Reallsticallv it reflects a broad range of educational activities and experiences. Given his 
pJrsS^l m^^^^^^^ currlcuKim as "the sum of the learning activities and experiences that 
ritudS his under the auspices or direction of the school.- Thus Included In the curriculum 
lout^ cTassrom. liborat'ory and cooperative work experiences, cocurrlcular activities such 
•s clulTand vocational student organizations, organized athletics, and music groups. It is within 
4hls satting that performance testing Is intended to be implemented. 



140 



IMPLEMENTATfON 

ISSUES 



One basic currlcular consideration has to do wItt) the alignment of educational objectives, 
performance testing, and the employment setting, While educators have recogni/ed. tor many 
year* that Instructional objectives for vocational education should be closely nlt^jnoff withrteeds 
of business and industry, it has only tieen recently thnt organi/od gmups have taken ovot |ho 
,vocatioital teaciiei s iespon»ibilily tu identity lelevanl ubjoctives. 

Consortia such as the Vocational and Technical Education Consortium of States (V-TECS) 
and the Interstate Distributive Education Curriculum Consortium (IDECC) have, In' fact, worked 
toward the alignment of objectives and the work setting. This has consisted of developing 
objectives and (In the case of IDECC) learning activity packages (LAPs) that are based upon 
extensive task analyses and personal interviews with workers and employers Given thts situation, 
it appears quite easy to move toward performance test Implementation (if it has not already taken 
place). ' 

V-TECS, for example, has developed catalogs of objectives and criterion-referenced 
measures that might serve as a basis for test development. IDECC Includes check sheets in many 
LAPS that can be used to evaluate student performance in applied settings. Of ma/or concern Is 
the potential that exists to develop tests that align with instruction, objectives, and job relevant 
content. The extent to which tests mesh with teacher and Consortium efforts may well determine 
whether or not performance testing Is accepted and used. 

A second currlcular consideration Is that of test content variation. Performance test content 
varies as a function of curriculum content and, as such, may require different approaches to 
development and use. A close look at code numbers used for occupations in the Dictionary of 
Occupational Titles^ reveals that workers have varying degrees of involvement with data, people, 
and things. For example, a salefsperson would have a high degree of involvement with people, a 
computer programmer would work more extensively with data, and a welder would be more 
involved with things. While test developers tend to perceive such differences in tests, 
administrators and teachers may not be as aware of how curriculum content is translated Into 
meaningful test content. If these variations are not taken into consideration, performance test 
relevancy may be seriously affected. 

A somewhat similar situation exists with regard to the instructional environment. Tests and 
the testing process tend to vary as a function of the instructional setting. Thus, a test that is 
designed for use in a vocational laboratory may not be applicable to evaluation in cooperative 
employment settings. This could occur because students are paid for participating In a 
cooperative vocational program and report to an employer, whereas, in a school setting they are 
not paid and report to an Instructor. In the school setting. Instructors have complete control over 
the testing situation while in a cooperative setting this control is shared with employers. An the 
Implementation of performance testing occurs, a close look needs to be taken at ways that tests 
can be adapted to different environments as well as what shared testing responsibilities may 
exist. This wlir at least partially alleviate some of the problems associated with testing in various 
Instructional settings. 

Matitlon must also be tpade of how performance testing may interface with the 
competency-based education (CBE) movement. While CBE has been in existence only a short 
time, its Impact is being felt In all parts of the nation. Some states have. In fact, mandated the 
implementation of CBE by a specified daU. Although CBE does not differ from other modes of 
education in terms of its goals, there Hre several key elements that serVe to make It a powerful ^ 
movement. These include using the competency (skill, attitude, value, or appreciation that is 
deemed critical to successful employment) as a basis for curriculum content, making available 



141 



FINCH 



•xpllcit crittrii for each competehcy. assessing competence In applied '"O*' ^""^O 
Simonstrate d competence serve as a determiner of student progress, and focus ng on 'acllltatlon 
oMtudent achievement of competences • It .s clear that CBE and per ormanco -t^"9 J^^-J^^ 
Dotuntial to work as u loani. and in mnny tiinrtioning CPF prnr7r«m« th«l .« th« c«»e A»>y steps 
Uker o ir^pr.^ent performs test.ng should thus be coordlr.ated svlth existing or proposed 
Sbe actlvuler O^^^^ 't '» much easier to effect one educational change than two separate 



changes! 

Teacher and Ancillary Personnel Considerations 



In manv respects teachens and ancillary personnel may be considered as the basic 
advoc.r.Vd c'coiumer. of performance testing. These Individuals are ''^•^'^^ "^-'"'•^^^ 
fests dittrmlne results, and make professional decisions based on these ^•»" ^j^ncllla^' 
personnel Include guidance counselors, placement officers, and similar specialists. These . 
^rl^nr«re in an excellent position to help students enroll In mean ngful P 
and asslat program graduates find employment While teachers obviously have the major 
?esp^nsibiiltyTp^^^ testing In Instructional settings, they are often heavily involved In 

atuS^nt wle^tion and placement activities and may work quite closely- with ancillary personnel. 

One basic consideration with regard to these groups is acceptance of the P^^^o^""""/;^ . 
testing concept Many may see performance testing as a threat to their posi ions; something that 
lerSes to hoPd'them a'ccou'ntabie for student achievement. Perfomiance testing n;"»y be vUiv.«<^ 
"hers as being no different from what is being done at the present time This -^tuation Is 
?ar'curarl7dmicult to handle since professionals believe that they are already doing what is 
p?oposed Others, however, might not be aware of performance testing's c°"^P'f' » .^^^^^^ 
orHy recognize their pemonai interpretations of the concept. Clearly, acceptance wIM be most 
difficult •mong persons who have misconceptions about performance testing, "n fact, 
professionals who have had the least involvement with performance testing may be most eager 
and ready to implement it. , ' 

Running parallel to the acceptance concept Is the expertise needed to conduct performance 
testing. Sanders' notes several poter^tiai problems associated ^'»hP«^*ormance testing 
administration. These include control over the testing environment and standardization of testing 
conditions and scoring procedures. Test administration processes are reasonably common 
^nSwIedge to measurement. specialists and those who have had «^P«^'«"^^^«^tS "h«nd have 
administering valid arxl reliable performance tests. Vocational teachers, on the other hand, have 
norZays tSen exposed to the psychometric properties of performance tests and how these 
properties may be altered through test administration. If performance testing is to be 
implemented in vocational education, the knowledge gap must be narrowed. 

While teachers are not expected to become measurement specialists, they should at least 
have a working knowledge of factors that can affect test validity and rellabilty. A poorly 
- SSiJiopeS and'admlnlstered test is worse than nb test at all. Consequently, any 'rnp'emem^^^^^^ 
scheme must deal directly with improving teacher knowledge and showing how this knowledge 
may be applied to realistic educational settings and testing situations. 

since some teacheni and support'personnel must be convinced to "^cept perforrnance 
testing and to learn about its unique character, how may this »"kbe accomplished? O^^^^^ 
approach consists of inservlce education. Credit or noncredit workshops could be o"e ed that 
provide educators with an awareness of performance testing, an understanding «tren^^^^^^^^^ 
. and llmltallons. and an opportunity to conduct tests under the supervision of workshop leaders. 



142 



lie 



\ IMPLEMENTATION 
I ISSUES 



On« k«y Mp*ct of the In8«rvlce education process Is motivation. If educators are not 
positively motivated to participate In Inservlce education, any proposed Implementation may be 

doomed to failure Palmer notes that both extrinsic and intrinsic rfiotivation are used to 
encoiirape educators to Improve their performfinre With reqnrd to extrinsic mottvntion 

The Impetus may corps from rule enforcement (making participation In Inservlce 
programs a requirement of the Job), or from rewards that are valued by the participants 
but do not stem from Improved performance (such as bonuses, Increments, certificates. 

etc.)'° 

Persons who related most closely to extrinsic motivation are those who have not yet satisfied 
their basis needs or dcf not obtain' satisfaction of higher order needs from their work. As far as 
intrinsic motivation Is concerned/'the Impetus for improvement may come from a desire to do a 
better job of teaching. Intrinsically motivated teachers derive satisfaction directly for the 
performance of their teaching dutles/*^^ 

Clearly, It would be desired that educators Involved In performance testing inservlce 
programs be Intrinsically motivated Some educators, of course, will not be motivated in this way 
and thus must be reached through extrinsic means. Then, once Involved in an Inservlce program, 
. these persons may become Intrinsically motivated to Implement performarice testing In their 
vocational programs. 



Administration Considerations 

Even though teachers and ancillary personnel accept performance testing as a worthwhile 
concept and have been trained to use tests, the, implementation process \i by no means 
complete. There are several fiictors in the admintatration of a performance testing program that 
must be examined very olosely. These factors can serve either to enhance or hinder 
implementation depenclir>9 upon how they are handled. Among the more critical factors are 
testing scheduling, test facilities, determining students' grades, and communicating test results. 

It is reasonably easy to schedule a classroom pencil-and-paper test. In this instance, 
students are a|l brought into tha classroom, sit at different deaKs. and are each given a written 
tast to complete. Performance testing takes on a somewhat different air. Students typically take 
performance tests Individually or by small groups in laboratory or work settings. In most cases, 
actual Muipment. materials, and people are used to make the test as realistic as possible. These 
requirements often place a heavy burden on vocational educators since it may be difficult to 
arrange test schedules in an acceptable manner and have adequate supervision available. In 
military technical training, where performance testing has been used successfully for over thirty 
years, scheduling It of major Importance. ^Mn ftict. blodks of time for performance testing are 
built directly Into students' training sohedules. and Instructors are assigned to coordinate and 
monitor testing activities. Time made available for testing may be as much as six hours and 
student-instructor ratios of six to one are typical. Given this situation, it is easy to see why 
performance testing in military settings is so successful. Students may be tested individually 
under controlled conditions under the watchful eyes of skilled instructors. They are placed in 
controlled environments before and after completing the test so that ansvyers are not passed on 
to others. 

• ' , 

The military testing mods^j indicates some of the major scheduling problems that may occur 
when performance testing is carried on in school settings. While recognizing that military 6nd 



143 



FINCH 



civilian vocational education do differ, educatori. .houid be •^•^•J^ '.^^^^^ be 

concerns The succestfui implementation of performance testing will require tha ^ 

to queti'ons such as How many blocks of time should be scheduled exclusively fo 
3or manco tontlng^ Whst mt.M be done to ensure a reasonably low student instructor ratio 
Surln™e^^^^ P^^'^^? How will test security be controlled before, dur.ng, and alte, .lud«ul. 
are tested? 

As an alternative to scheduling blocks of time, teachers might J.^l'* °" '"^'^^^^ 

basis Whenever students appear ready. This may be quite ««»y ^.^.^^I^.P''*^-;^^^^^^^ 
»h« oerformance is of a manipulativG nature. As with group testing situations, it '» •"f""*' 
Z t?a?heMo use standTr^lzed equipment, materials, directions, and conditions. Addltlona ly. n 
t^e case o t^U^^^^^ fault diagnosis (e.g.. electronic and automotive * 

lirge n^mbiro? representative troubles must be at hand. Otherv^ise. hints may be passed on 
from one student to another with the result being invalid test results. 

A natural outgrowth of scheduling processes Is the establishment of testing facilities 
Numerous author? have emphasized the need for a facility or area that may be used exclusively 
Tor prrtoranceTestlng. Wilson indicates that it is "highly desirable ''^ J^^jo;,^ ^ 
coukl be pet aside for conducting performance tests.- *° 
ensure that uniform conditions are set up for all examinees. While this not'O" may seem 
TartThed to vociti educators in the public schools, it is one which should be seriously 
cons de?ed Having uniform test conditions allows all examinees an equal OPP^'^""'*^;^ f^*^*'' 
beJt wrrrwThIn the testing area, it is extremely important to have equipmen -"d materials . 
whTc^a e t^wme from one test administration to another. If examinees are tea ed with 
In-^ul^^^ and materials under varying conditions, test scores win not ref ect 

MrfoTmance aga^st a standard criterion. Vocational educators must recognize the need for 
s7ilJ.da7diza?ion tn testing processes and adhere to these standards whenever tests are 
administered to students. 

AUK«..oh fiixA/fivii associated with performance testing, the issue of grading is often raised 
w Jldent achir/ment iH^"^^^^ While most teacher, would agree that grade. .er,e 
tal7S^ruVDurro.« arJd no I. an Integral part of our educational proce.. and as .uch, mu.t be 
Z« "^h ..TertoTmSnce t^^^^^ Is Implemented. 0. practical consideration I. the way or way. 
?hri V^rtSrmance »«t ™.ult. can be translated Into a locally e.tabllsh«) grading scheme. 

Situations may point to a host of potential grading problems. , ■ 

A final administrative consideration has to do with articulation.^ Performance testing has 

.X.hi.. .iiirianM to receive advanced placement at community fcolleges and technical 
thaT assist groups of educators to note exactly what Is expected of s udents in various 



, • 144 \, 

ERIC ^ ' lis 



IMPLEMENTATION 
ISSUES 



•ducitional settings. Thus, as performance test implementation takes piace, a iook should be 
taken beyond Individual courses and schools to soo how piocossos might bo aiticuiutod witi) 
schools and programs at other levels (e.g., secondary, postSHCondHry. adult, CFTA) 



Student Considerations 



While students' needs and interests are often considered as vocational curriculum content is 
being established and teaching/learning strategies are being selected, this is generally not the 
case when tests are being dtvi8ed> Apparently, some teachers have felt that testing is a secret 
process that must not be revealed to anyone until some appropriate time. Students are required 
to develop high levels of anxiety ^nd engage in testing activities that are very unfamiliar to them. 
Obviously, if such practices are fbllowed with regard to performance testing, the end result will 
be even greater anxiety and frustration. An alternatve to the possibility of utter chaos is placing 
greater emphasis on students' concerns and being sure that these concerns are built into the 
testing process. / 

Initially, it might be best to e^camine students' acceptance of the performance testing 
concept. Since some students have only taken penci^nd-paper tests, they may not understand 
what 'peformance testing Is. For these studehts, It wouw be necessary to design some sort of 
orientation program that clarilles perforrhance-testing procedures, provides each person with 
"hands-on" experiences, and generally-,rel levee anxiety. This approach should serve to improve 
students' acceptance of performance testing and speed the implementation process. 

A second consideration has to do with student contributions to testing. Students can be 
given opportunmes to help design tests. For example, if a test involves "cutting a piece of metal 
with an oxy-ac^lene-cutting torch," students might talk to welders about the standards 
tradespersons would use to evaluati^ such a cut. They might read technical manuals'to determine 
meaningful process and product criteria. The information could thenvserve as a basis for 
evaluating student perfomiance. Students would, thus, be more aware of how they are expected 
to perform and where test standards come from. Even though students are seldom Involved in 
test design, the nature of most performance tests makes this procedure reasonably easy to carry 
out. It should not detract from the validity of most teyts and will certainly r^uce student anxiety. 

A final student consideration has to do with evaluation of testing. All too often, teachers do 
not report to students about how well they perform on tests. Students do not like this sort of 
treatment, and It will piffect their attitudes to any type of testing, including performance testing. 
White written nearly thirty years ago. Michaels' and Karnes' comments about performance testing 
are still very apprbpriat*: "After the test has been administered and scored, discuss with the class 
outstanding strengths ahd weaknesses noted. Qlve the students an opportunity to ask questions 
and clear up any misunderstandings,"** 

Reporting results to students helps them understand the importance of time, efficiency, 
proficiency, quality, and timilar performance criterion, It also serves to reinforce the importance of 
doing one's b«st work on a teat and amplifies the need to follow test directions and procedures. 

' ' ' ■ . \. 

I ■ \ ■ 



\ 145 



FINCH 



The Community 



Even though Individuals In the community *.ay have llttle Involvement ^'♦^l P^^^^*^®"^® 
tettlng. they must not be left out of the Implementation process. In this case, the approach taken 
is mow ftkin to public relations with, key groups Irvthe community being Informed about . 
performance testing. With parents communication- needs to be started.early In the '"^P'«"^«";«; 

process, and t?ey should^e told why performance testing Is being used a. well as wha It 
means to their children. When a youngster comes home one day.and 

"wf^lfd" pertormance test, the parent should already have s6me notion about such tests. Keeping 
parents Informed serves to strengthen support for performance testing lathe schools. wP^^'ai^V 
If those parents take an active part M reinforcing comments made to students by their teachers. 

Most vocational programs enlist the asslstance'of advisory committees composed of 
business and industry representatives. These corpmlttees advise •"^ 

by verifying the need for instruction, examining course content, providing teachers with technical 

assistance, and providing various services to students, the school, and the p^mm unity. Any 

performance testing Implementation plan should give fconsideAtion to 

may raijge from informing membeni about performance testing to soliciting ideas for test 

development. 

Advisory committias help to link educatioh and work and. as such, can provide invaluable 
services The Vocational teacher should, thecefore. draw heavily upon thii resource whenever 
tests are being developed and revised. Assistance might consist of identifying approprlUte work 
samples identifying p^ntial^ criteria, selecting equipment and materials, and reviewing testing 
rn7.co;in?procedun3FExtenslve Involvement by advisory c6mmittA»tf will, contr bute greasy to 
the solidification of community support since members tend to.be key leader)^ '"J^'^,;"P«*'^*» 
occupational areas. Their support of the performance -^stlhg concept wll* be looked upon by 
other employers as a very positive sign. , ^ 

* . " 

Employera, other than advisory committee members, also, need to be informed about 
performance testing. As the consumers of vocational education products (graduates), employers 
should have a basic understanding about how vocational students are tested and how test 
performanqe align* with work performance. 

In order to keep employers informed, some vocational programs have developed 
performance-based transcripts that indicate what the Individual student is able Jo °* 
raskt and skills rathar than mer«Jly using a statement of grades. This aPP^o«ch ets he e^P'over 
know what to expect of a program graduate and helps In determining the Initial duties that 
imons will have on the job. The basic focus of^erformance tests can easily wrv« « 
foundation for transcripts. Details such as the level of acceptable behavior and conditions might 
also be Included for each listed item. 

. Employers appear eager to find out more about what potential employees can do. and 
performarlca testing has the potential to meet their needs, particularly if a meaningful 
communiesatlon^device such as a performance-based transcript is developed and used. 



W'r: .. . 



150 



IMPLEMENTATION 
ISSUES 



Summary 

Implementing performance testing in vocational education settings is a complex procoss 
Ihose leaponsible for implefnenlalion niust lake a liosl of factors into account and work with 
numerous groups and individuals if any sort of success is expected to occur, 1 he<:haracter of 
vocational education demands tfiat linkages be developed with persons in education as well as in 
the community at large. Teachers, support personnel, administrators, and students each have a 
rofe in performance test implementation, Failure to include one or more of these groups in 
implementatiori plans wHI most certainly work against the movement. 

Finally, parents, advisory committee members, and empld^s play an important part in the 
implementation process. Their collective support ensures that performance testing will be 
recognized as being beneficial to persons outside of e^jjcation. < • 

The message Is clear that implementing performance testing in vocational education will be 
a difficult, time-consuming task. However, given the many benefits derived from performance 
testing, any time devoted to implementation will be well spent. 

\ 



» .147 ^ / 



FINCH 



NoUt 



.'John M. Rich, innovations in^EduoBtfon, ^formers and Their Critics (Boston: Allyn and Bacon. 
1978). 

^Ronald C. Havelock. The Change Agents' Guide to innovation in Education (Englowood Cliff*. 
N.J.. Educational Technology Publications. 1973). 

3L Hull J Kester. and B. Martin. A Conceptual Framework for the Diffusion of '"/^f vat ions 
(CoVumbus OhYo:Aenter for Vocational and Technical Education. The Ohio State University. 
1973). ' } t 

^William L. Hull and Randall L. Wells. The Classification and Evaluation of '^o^^'J^jl^ . 
Vocational and Technical Education (Columbus. Ohio: Center for Vocational and Technical 
Education. The Ohio State University. 1972). 

'Curtis R. Finch and Robert L. McQough. Administering and Supervising Occupational Education 
(Englewood Cliffs. N.J.: Prentice-Hall, in process). 

•Curtis R. Finch and John R. Crunkilton. Cufriculum Development in Vocational' and Technical 
£ducaf/on (Boston: Allyn and Bacon. 1979). p. 7 

tJ.S. .Department of Labor. Dictionary of Occupational Titles. 3d ed. (Washington. DC: U.S. 
Government Printing Office. 1965). 

"Flndh and Crunkilton. Curriculum 'Development in, pp. 220-21 

•James R. Sander,. "Measurement Problems and Issues Related to Applied Performance Testing" 
(Paper presented at ERA. April. 1976. 

-Teresa M. Palmer. "Ih-servlce Education: Intrinsic Versus Extrinsic Motivation "In Louis J. 
Rubin, ed.. The In-Service Educatioa^f TeaehTs (Boston: Allyn and Bacon. 1978). pp. 215-19,. 

"Ibldvi pp. 215-19. 

"Robert Glaser and David J. Clouse. "Proficienpy Measurement." «n Robert Q«"9«- 
Psychological Principles of System Development (Nevyr York: Holt. Rinehart. & Winstoh. 1962). 
pp. 419-74. _ " 

I'Clark L Wilson et (il.. A h/lanual for Use in the Preparation and Administration of Practical 
Performance Tests (Washington. D.C.: Office of Naval Research^ 1971). p. 48. 

i^Wllllam\ Michaels and M, Ray Karnes. Measuring EducatiorLl Achievement, (He^N York: 
McQraw hI|Q|950). p. 366. ' ^ J 

"Center for Vodfttlonal Education. Organ/ze an Occqpation^ Advisory Committee, Module A-4 
(Athens. Qa: AAVIM, 1978). pp. 8-9. 



148 

152 



IMPLEMENTATION ISSUES: 
COMMENTS 



Comnwnts on lmpl«m«nUitlon lMu«t 
In Vocational Education 



Jiirtet E Spirer 
National Center for Researxrh 
in Vocational Education • 
Columbus, Ohio 




"The'best laid plans. . .** is a well wofy^Tirase thaVwe have ajl heard and probably find 
ourselves muttering from time to time. It certainly may be applied to publicly funded programs 
whereat is often acknowledged that there is a gap between policy intentions and policy 
implementation. Recognizing the tendeWy for this gap to exist— and often expand— is crucial, 
regardless of the policy or program being implemented. The two implementation papers present 
some concerns with which administrators and teachers must deal when implementing 
performance testing. 

The authors broach the implementation issue from two different perspectives which appear 
to be complementary. MUward discusses th6 process by which an evaluation system, partially 
completely relying on performance testing, can be implemented. He explains how ideas or issues 
come to the fore (i.e., ideas in good currencyVand who should be involved in designing 
implementation strategies ("street level bureaucrats"). Th^y^jor strength of Mllward's paper Ii€t3 
in its generallzablllty. That is, administrators could apply the concepts Mllward Introduces to any 
program planned or currently In operation. ^ - 

, lf an administrator sat down and as Milward suggests, "mapped backwards" to identify those 
persons who should be involved in the implementation process, the "considerations" addressed 
by Finch certainly would emerge. Finch's ^aper is written more pragmatically and should help an 
admlnistritor begin to identify specific audiences (and what he terms "coi^sideratlons") that 
might affect the implementation process. The^ include: curricular considerations, teacher and 
ancillary personnel considerations; administration considerations; student considerations; and 
community considerations. 




Thus, while Miiward's paper introduces the proWKV an administrator impfements 
any evaluation system. Finch provides the reader witTTa "laundry list" of who and/or what 
"considerations*' might affect the implementation of^pterformance testing. However, 9 note of 
caution Is appropriate. While the implementation process is generic, each vocational education 
program or school exists in an individualized environment with its own set of actors, cdhlitraints 
and problems. Therefore, Finch's "considerations" should serve only as the first step when 
"mapping backward." This haindbook, as a whole, deals with other considerations that might 
prove to be as, if not m some ca^es, more important for a specific vocational education program 
or school. For epcampie, some legal considerations, especially if a stiste has adopted a minimum 
competency testing law, might bl' crucial to successful implementation. Or,'the institution of 
performance tests that are not proven to be valid and reliable might undermfhe the entire 
implementation process. . ^ ^ #' . 



149 



Also the purpose behind performance testing -an evaluative tool to improve P^ograim and 
student lolrning Should t>e focused on as tho i.npUunonlalion proco.n ir. donianod nnd thon 
La.nud out Di.satistnrt.on w.th ovnl.mMorVs u.«f..ln«sH has producod an oxtonsivo body of 
mo aluro contending that ovaluat.on soldom u.fluoncos program doc.s.on-maku.o How-er, 
Studies have been reported that deviate from this stream of thought f-or. example Michael U. 
Patt!^ Edward C W^ks. and Marvin C. Alkin. .t aP have made strong cases for the usefulness 
of evaluation by adopting a broader definition of utilization 

The literature is replete with suggestions for increasing the utiliration of evaluation 
information. For example. Weeks' offers th.oe (acto.s thought to ^^^'^'^^^'^ ^i;^"^^;^^^^^^ 
fmd.ngs (1) organizational location. (2) methodological practices, and (3) decision context 
Alkln et aP have Identified eight factors affectlni, the utilization of evaluation information. These 
include (1 preoxisUng evaluation bounds. (2) orientation of the users. ^l^^^'^nt ^TrZL 
(4) evaluator credibility. (5) organizational factors. (6) extraorganlzational factors. (7) Information 
content and reporting, and (8) administration style. 

Regardless of whether one subscribes to Weeks' model. Alkin et al's model or other models 
appearing in the literature, inherent In all of these models are factors which need to be carefully 
fdentmed and defined in order tP implement a performance P^^A^^'^- '^''^«[VfL,-, 

• a^applng backward" as a method to identify the concerns and their ^ 
••Considerations- often will surface in this process. However, the point to be made here Is that no 
author c^^^^^^ a priori, the actual considerations that will be appropriate In every setting. 

These papers describe the implementation process and some considerations that may be 
Ipp opX Butthe final hst of considerations that emerge when the implementation process is 
conceptualized and then carried out must be individualized to meet the specific needs of a 
vocational education, policy, program or school. . ^ 



150 



154 



IMPLEMENTATION ISSUES: 
COMMENTS 



Notes 



'For •x«mpl«, tea Michael Q. Pttton. UtIlliMtion-Focused Evaluation (Beverly Hllli. California: 
Sage Publications, Inc. 1978); Edward C. Weeks. "The Managerial Use of Evaluation Findings." In 
H.C Schulberg and J. M, JerrelKEd) The Evaluator end Management (Boverly Hills. California: 
^ge Publications. Inc . 1979) pp 137-265; Maurice C. Alkins. Richard Daillak and Peter White. 
Using Evaluations (Beverly Hilla. California: Sage Publications Inc., 1979). 

'Weeks, "The Managerlals Use/' p. 139. 

'Alkin, et al.. Using Evaluations, p. 235. 



•161 

15 



5 



CHAPTER SIX 



IMPLICATIONS FOR 
VOCATIONAL EDUCATION 



I 



* Th9 first pBp9r by Robrt E. Spillman and Charles D. Wade begins by exploring different . 
perceptions of vocational educffflon (e.g., human resources view, humanistic view, social reform 
view and ganeral education vl9w). They then discuss why four Issues— (egal mandates, human 
resourca needs, student needs and Institutional and curriculum concern^-^arh Important for 
vpcattonal education. Tf\f paper concludes byoffering the respohse they feef vocational 
education must make to tha philosophical, technical, legal, and Implementatluon lissues raised In 
the handbook. 



4 



153 



156 



In the second paper. Nellie Can Thorogood also deals with the question of implications for 
vocational education Usmg a dUtoionl w^m;/. horn .Sp,//mm. and W,do. sho looks nf fhn rolo 
Of ■stiikofyoldor^" in vnrnfionni odnrntion mui poilommnct^ tesUnq. the uses of porfoimanco 
tostina in voi^ational education mid discus.^os the implications of the issues raised tiy the 
contributors by delineating those internal to and external to the institutions. A third perspective 
on the implicatjons of the four issues for vocational eduction is presented by Marvin R 
RasmUssen in the Comments paper. * 



\ 



154 



15% 



IMPl (CATIONS OF 
THF ISSUFS 



The Implications of the Ittuet for Vocational Education: 

A Viewpoint 

Robert E. Spillman 
. Charles D. Wade 

Bureau of Vocational Education 
Frankfort. Kentucky 



introduction 

Performance testing is a tool which can be used by vocational educators to improve the 
quality of programs, enhance the learning process by students, and strengthen the accountability 
of vocational education. However, it is not without problems or limitations, but with careful 
planning the process can be effectively implemented into vocational education programs. 

The purpose of this chapter Js to review the major issues in performance testing idjrfnti)^ by 
the^aut^rs of the previous chapters and to bring into sharper focus the implications for 
vocational education. 

. The contributors to this publication agree with Slater's definition that "performance testa.-* 
refer to tests in which the test stimulus, the desired response, and the siirrounding conditions 
approximate the reality of an actual situation drawn from a Specific occupational or role-based 
context."' Several ol the contributors discuss in detail the variety of reasons for performance 
testing. The consensus seems to be an agreement with Slater's four major purposes. (1) 
formative program evaluation, (2) summative program evaluation. (3) instructional management 
and decision-mdking, and (4) student certification. 

* 

At this point, the reader begins to identify some conflicts among the philosophical, technical, 
legal, and implementation Issues surrounding performance testing. To relate both commonalities 
and differences of the Issues of performance testing to vocational education, some 
understanding of the purpose of vocational education is necessary. 



Exploring Different Perceptions of Vocational Education - 

There is no widely accepted statement describing the purpose of vocational education. 
Although various documents from the federal government, state education agencies, and local 
institutions address the purposes of vocational education, no effort is made in this chapter to 
persuade the reader to accept or reject these purposes. Rather, this chapter will simply explore 
some different perceptions of vocational education. 

Human Raaources View. Some believe vocational education is responsible for supplying a 
pool of well trained people from which business and industry can select employees. This view 
requires that tha graduates have entry-level job skills and appropriate attitudes that make thent*- 
productive on the job and contributors to the economic growth of the community, state, and 
nation.' In this perspective, service to the economic system dominates service to the individual. 



165 

158 



SPItlMAN A WADF 

Humanistic View From this view, vocational educators are responsible (or preparing oil 
vocat.onal students (or omployme.H m tho.r c-,tu>sor, vccat.o.^s I lu, nomls .uui J 
Mtidenls a.u g.von .najo. con:.ido, ation ir. nil a'.port. nt tho proqrMm Stuclnnts ,.re c.hnllo.u,od to 
acJuovo to the highest level of tr,o.- potont.nl. ,o,,udloss c,f tru, local nv.ilnhility o, ,ohs From th.s 
point of view, the graduate, in a mobile society, seeks employment in a broader area and 
becomes a contributing member by t)eing tramed for maximum contribution Curriculum 
decisions are more sensitive to Individual needs than to local )ob market requirements. 

Socal Reforsri View. HecerU federal legislation has fngiilightod thir. view by giving loss 
attention to human needs arul desires and more attention to incronsiruj the orMoliment of ho h 
sexes in nontradltional classes Again, education is asked to he the leader in removing social 
deficlGncies. such as discrimination based on sex. race, economic deprivation, and physical or 
mental handicaps. In attempting to meet these needs, vocatiormi educators are often faced with 
conflicts when the community expresses resistance to the social reforms. Parents may not want 
their children in nontradltional programs, and employers may be slow to employ graduates for 
nontradltional jobs The social reform approach maximizes access to all programs for any 
student and pressures traditionalists to accept contemporary societal goals 

General Education View. This view acknowledges the need for the Institution to assist 
students in making meaningful career choices: it 6lso promotes the idea that «P«^'*'^ 
should not be taught in the Institutional setting. In this view, the students should be g ven 
economic awaleness. self-awareness, and career awareness, with the specific skill J° 
the employer. Supporters of this concept believe all students should receive some orientation to 
a variTy of occupations without spending extensive periods of time in developing competencies 
in a specific occupation. More time is spent socializing the students to the labor force than 
developing skills. 

All ot this leads up to the fact that the implications of performance testing for vocational 
educa ion depend, not only on an understanding of performance testing ^'''^ « P^^;'" 
of the purposes.of vocational education. In Chapter Two. Borow discusses sq me o 'he cor^fl^t 
thatTcurs between the goals of optimum human utilization and the objectives of maximizing 
personal potential. 

Important Issues for Vocational Education 

The intent of this handbook is to identify issues underlying performance testing as they 
relate to vocational education. Perhaps one question which ^^^^^'^^^^ «f '^^^ '^/^.{^^^^^^^^^ 
educatofs ar« cljncerned with performance teatlng at this time. In Chap er Five ^Hward clearly 
states that performance testing per se Is not an innovation In vocational education. The brief 
history of performance testing In the Preface indicates that this form of testing has been 
acknowledged and. In fact, used by vocational educators for many years. The answer to the 
current concern may be found in the new degree of sophistication in the tests. '«8tmg 
Drocedur«s and test analysis and in the innovative uses of performance testing. Why these 
i^s^s are important for vocational education can be discussed in four areas: (1) legal mandates. 
(2) human resources needs. (3) ^tudent;ieeds. and (4) institutional and curriculum concerns. 

Legal Mandates. While Public Law 94-482-the Vocational Amendments of 1976-atid Its 
■ resulting regulations do not specifically require performance testing, it is certainly a method to 



156 



IMPLICATIONS OF 

THE ISSUFS 



b# considered In addressing the requirements for program evaluation Section 104 402 of the 
Rules and Regulations states: 

"Tho Stuto Dour (J sfmll. duiing tfio (ivo youi puiiod o( tho state piait, evaluate m 
quantitative terms the effectiveness of each formally organized program or project 
supported by Federal, state, and local funds. These evaluations shall be In terms of: . . . 

(b) Results of student achievement as measured, for example by: 

(1) Standard occupational proficiency measures; 

(2) Criterion referenced tests; and 

(3) Other examinations of students' skills, knovirtedge, attitudes, and readiness for entering 
emplovment successfully."' 



State boards have struggled with this area of evaluation. Performance testing has not been 
widely accepted as a program evaluation tool. Slater's summatlve program evaluiitlon description 
'is .appropriate for describing the utilization of performance testing for program evaluation. As 
indicated by Mllward, performance testing for prooram evaluation Is Innovative and must 
encounter the implementation problems that he and Finch address In Chapter Five. According to 
Pullln, there may alscTBe legal Implications, suc|i as a situation In which program quality requires 
termination of an Instructor's contract. 

In three-fourths of the states, legislatures have considered some form of minimum 
competency testing, according to Tractenberg. A few states, by policy and regulation, have 
mandated competency-based vocational education and its related curriculum-based performance 
testing. Borow describes a relationship between competency-based programs arid performance 
testing. As these programs grow In acceptance, states are mandating local partl^pation. 

Student certification In occupations seem to be Increasing. Performance testing for student 
certification In vocational areas has generally been limited to the health and personal services 
areas such as nursing, cosmetology, and barbering; however, licensing requirements for aviation 
mechanics and communication electronic operators have existed for years, t^wer efforts Include 
certificaton of fire fighters, emergency medical technicians, and automobile mechpnics. 

According to Pullln and Tractenberg. the area of student certlflcatlort— and Its legal 
implications— Is a major concern. For tho4e adhering to the human resources perception of 
vocational education, student certification Is a positive step for any occupation, since (t gives the 
employer greater assurance of hiring a quality employee. Persons with other views of vocational 
education may resist performarice testing for student certification; however, new occupations 
may mandate such student certification for graduates who wish to work In those occupations. 

Whatever one's perception of the purpose of vocational education, the legal mandates by the 
federal governpfient, state governments, and occupational boards and agencies make perfor- 
mance testing a concern for vocational educators. 

# 

Human Resources Needs. For a large number of vocational educators, advisory committees, 
and business and Industry represeritatives, needs for human resources deserve special attention. 



157 



/ 

/ 

/ 



SPIl I MAN A WAOF 



If vocfltlonBl programs are to be accountable to employers, students must be trained in the 
ontrv-level skills requirod for the )oh In Ch«ptor Ihree, Kloin piosonts a nnnM fo. doto..nin..Hj |ol 
competencies as well a. loi dcvolupincj po, for-nruirr f.^.f. f^opor portormnnro tosts rnn 
measure each studonfs )Ob compotoncy and tho onti.o piocjram's prc.ficoncy m mlating to ac tnni 
job requirements. 

Not only do graduates need initial job skills, but they must also possess that 
difficult-to-measure trait called "employability, " Borow discusses the need to include the affective 
domain m performance tests since many )obs depend on such things as attitudes and ethics; 
however. Tractenberg cautions that there are legal problems relating to tho students right to 
privacy whdn attitudes are included in the tost items 

The first objective of vocatidnal education graduates is to be employed, but they soon wish 
to advance to positions requiring greater skills, better human relations, and leadership ability. 
While the earlier writers do not stress need for leadership development. Borow states thgt 
■performance tests should be chosen and administered to measure competencies related to the 
aims of broad, liberal education as well as those of work,"' 

Employers apparently want workers with skills, but in line with the "general education view" 
of vocational education, they also want employees with job adaptability and advancerrient 
capabilities Performance tests strive to simulate the actual job situation, but final evaluation may 
have to come with follow-up studies of both the employers and the graduates who have been 
placed on the job. 

Student Needs To vocational educators, social service agency personnel and advocacy 
groups of various types, vocational education can be the answer to the employment problems of 
most people However, the goals of serving industry and meeting the needs of students are often 
In conflict For instance. Borow notes the conflict between an open admissions policy and the 
use of certifying examinations. An open admissions policy is "humanistic." while student 
certification supports a "human resources" view. In addition. Pullin and Tractenberg agree there 
are problems associated with performance tests for student certification; i.e.. In establishing 
performance standards, educators must maintain integrity with employers and. at the same time 
be aware of the possibility of discrimination to thb student because of socioeconomic 
background, race, or sex. 

Performance tests must be constructed to protect the rights of all students. Those who view 
vocational education as a "social reform" program see this as a major issue. In no case should 
performance tests discriminate on the basis of race, sex, handicap, or membership in a special 
population. Pullin and Tractenberg point out that using "instructional management and 
decision-making" for evaluation presents problems since the remedial program indicated by the 
diagnostic test could segregate the groups by sex, race, or type of handicap. Performance 
tetts for summative evaluation can present a problem when classes or institutions have a 
disproportionate enrollment of special populations. The expectations for successful program com 
pletions may have to be altered when a large number qf students are academically, mentally, o 
physically handicapped. 

Institutional and Curriculum Concerns. Administrators of vocational programs must be 
concerned about the use of performance tests in their institutions. A good deal of controversy 
surrounds the uses of performance tests and who makes the decisions regarding their use. 
Performance tests may b^good, but Borow raises the question, "for whose good?" 



let 



IMPl ICATIONS OF 
THF ISSUES 



Teachers may not object to formative program evaluatiori when the purpose is to make 
program adjustments and curriculum Improvement Students may not object to "Instructlonnl 
management and decision-making" ovaluatlon as long as it is used for prescriptive programming 
for instiuction. but summativo piogiam ovaluution arieols Ihe teacher personally, if the results 
indicate program termination. Student certification is also viewed with alarfTi by students who 
have spent up to two years in ^ program and then are rejected from the occupation by a final' 
performance. test. These kinds of serious concerns require resolution. - 

Institutional administrators must also be concerned about the cost of performance testing 
and the time allotted to testing. Finch stresses the need for performance testing to become a part 
of the Instructional program with time blocks, space, equipment, and personnel assigned to this 
task. The military has used this approach for years and assumes it to be an important function of 
the instructional process. The competency-based vocational education movement incorporates 
performance testing concepts in the Instructional program, since each competency must be 
mastered to the desrred standard before the student can be recognized as having completed the 
task: Administrators and instructors must clearly identify the relationship between the 
competency-based vocational education curriculurrf'and performance testing. 



' R9sponsB of Vocational Education to the issues 

In this section, the authors deal with the response that they feel vocational education must 
make to the philosophical, techncal, legal, and implementation issues associated with 
performance testing. The topic Is dealt with in six major subdivisions: (I) philosophical adoption 
of the concept. (2) test development and administration, (3) uses of performance tests, (4) access 
and equity. (5) curriculum improvement, and (6) implementation of performance testing. 

■ ■ # 

Philosophical Adoption of the Concept 

The fact that Willers and Borow did not quite reach agreement on a philosophfcal base for 
performance testing points out the need for each vocational education agency to proclaim its 
own philosophy of education formally before initiating perforrtiance testing. To be successful in 
this endeavor> educatiqnal leaders must develop general goals of education— including 
vocational educatl(jn. These goals need not be measurabie; in fact, the major purpose should be 
to set a direction for the organization that is consistent with Its basic philosophy. Only those 
Institutions that believe in job training should attempt to develop performance objectives for 
vocational education. Vocational educators should develop specific, measurable course 
objectives that are based on actual job needs and on well-estatjiished general goals. 

While some narrowly define performance tests as measures of psychomt)tor skills only, 
developers and users of such 4e8ts would be^weW^dvlsed to IneKide cognitive competencies and. 
when the technology permits, the affective d9main. It should be noted that the regulations for 
P.L. 94-482 Indicate a need to measure "stjjdents' skills, knowledge, ^itudes, and readiness for 
Entering employment successfully." This challenges educators to de>ilop measures to address 
the "whole person," When performance tests do not measure the codhitlve and affective domains 
adequately, vocational educators should supplement the test with ot^er methods of evaluating - 
t^iese domains. ^ y 

There is no merit In having a "pure"' performance testing system If It does not meet the 
needs of the student, and the institution. State and local vocational agencies should supplement 



159 



SPILLMAN A WADF 



performance testing by developing and Implementing an extensive follow-up system. Such a 
Syltem Should determine the extent to which vocational grad^iates are placed m the occupations 
for which they are trained. The follow-up should also assess tho oxto-.t to which ""^P'ov^^^ " ^ 
sal.sf.ed with Iho t.ainmg .ocoivod by thoir omployonn Analysis of surh data should be useful m 
supplementing performance testing Since future funding may bo contingent on how we I 
graduates perform In the actual occupation, this type of data could prove to be Invaluable. 

Attention should by now have been directed to one major reason why performance testing 
should be adopted-and while many good reasons may be discussed, one top priority must be 
the desire to achieve accountability, Accountability is the domlnatlrig force in modern 
decision-making at the policy, legislative, and budgetary levels Regardless of which of the v ews 
of vocational education are held by educators (most probably accept a combination of all four), 
vocational education does deal with selecting, preparing for. and securing «;o?^;^ocationfll 
education assists pebple In moving from a life focused around school to a life focused around a 
lob It serves to bridge the-gap between school and work for many people. To ^^Is end 
accountability deals wltb^the extent to which the program assists students, through successful 
employment, to become contributors in the economic system, 

Agencies and Institutions that recognize the basis for performance testing and are willing to 
supplement testing with other appropriate measures should find IJ®"®/'^'^' ^ . 

documenting the accountability of vocational programs to the public and to the policy makers. 

resf Devetopment and Administration. Vocational education must respond to the technical 
aspects of performance testing by developing acceptable measurement '"Strurnents and - 
administering these tests in a manner that stands scrutiny by professionals In the testing field- 
The performance tests must meet the tests of validity and reliability. 

■ Klein and Perloff discuss the relative dlfflcglty of developing performance tests, Vocational 
education performance tests should be based on actual occupational n^daand be 
representative of on-the-job situations. In this regard, much work has already been done that 
should ease the developmental process. The Vocatlonal-Technica Education Consortium of 
States (V-TECS) has developed many catalogs of performance objectives through a ^'9 <1 . 
research process that ensures that the most Important tasks performed by ^o-^^®^; '"^'"^^f 
If both the curriculum and performance tests were developed using «PP^°f^^ 
of V-TECS. the effectiveness of the developmental process. as well as Its cost, should be more 
pleasing to administrators. 

Performance tests may vary In their degree of sampling but .the critical aspect should be 
predictability of the t«st. A variety of testing approaches, such as direct work obwrva on work 
Mrnple. and simulation should be used to ensure that the performance tee^ts 
"ewino the students as they should function in the actual job setting. Tests should be criterion 
raScei In order to measure the level of competence against trfe standards of the occupation 

. Uses of Performance Tests. Each segment of the vocational education community rnust 
carefuilyr study Slater's purposes of perfomiance testing and identify those areas *^^ft ^ " ''^ 
'most important In Its program. For example, performance tests given before student enrollment 
in a prooram may be used for screening or diagnostic purposes. However, screening wl be 
permitted In only a very few programs operated by put)lic e.ducational Institutions. The legal 
Ks noted by Pullln and TJactenberg can generally be avoided if the tests are "»ed for | 
diagnostic purposes, fn order to prescribe a meaningful Instructional program for each student. 



160 



163 



/ 

I 

V 



IMPLICATIONS OF 

IMf ISSUFS 



In addition to performance testing before student enrollment, tests can also be very valuable 
during the course of student programs For Instance, during a program, performanco tontn can 
be used effectively tor both student and program diagnostic purposes In routo testing of skills 
should leduce the Ukehhood that students could spend n)or>ths in a program oniy to iearn riear 
the end of their prograrn that tt>ey ace unable to pass the performance tests. Also, with student 
diagnostic tests, provision* can. be made for remedial programs and services early In the 
program. Performance teSts for^rogram evaluation purposes should direct teachers and 
admlhlstratoi^ to make program adjustments without long delays 

Finaljy. administration of performance tests at or near the end of the program permits both 
the certification of students and summative evaluation of programs In the future, there may be 
more occupations tor which licensing tests are mandated In the meantime, vocational educators 
can use performance tests as a means of describing the tasks that students can perform. The test 
score may not always be used to determine successful completion of a prograrp; rather the score 
can describe students* skills when they leave the programs The end ^est can also be used to 
make program changes and, in sortie cases, terminate programs not meeting standards. 

Vocational educators, educational planners, and legislative bodies rriust use care in analyzing 
the results of performance tests^ Test dgtacan be very useful in improving vocational programs; 
however, the tendency must be. resist^ to misuse the data in ways such as limiting enrollment of 
those predicted to fail by the performance test originating programs based solely on test 
performance of the graduates. Care must also btf^ken not to misuse the concepts of 
performance testing; i.e., dbusing the rights of students and teachers by expecting more from the 
results than the test is capable. of giving. S 

Access And Equity, The problems of access and equity are oft^ created by inappropriate 
and unrelated criteria for entrance or acceptance in a program or a job. Sex or race are not 
appropriate criteria for assessing ability to do a particular job. The concepts of performance 
testing should provide an opportunity to overcome mahy of the issues of access artd equity. 
Properly validated performance tq^ng— not race. sex. socioeconomic background or other 
discriminatory criteria— should measure ability to perform the job. Graduates of vocational 
programs who possess certification that they possessJhe competency necessary for a particular 
job. have a valuable bargaining tool in seeking employment. Certification providets an opportunity 
te. focus the employment interview on documervted competence, rather than on social bias. 

The concepts associated with validation of performance testing must provide assurance that 
there is a direct correlation between the content of the instructional program and the content of 
the test. Whether students are admitted to or complete the program should be based upon their 
ability to perform identified tasks and not upon Qther'iinrelated criteria 

V 

If It is used properly, the performance test will enhance education rather than victimize' 
students and instructors. Proper use canj^^^omplished by adhering to the guideline^ for 
fundamental fairness, due process, andjipi^as described by Pullin and Tractenbergr 
Statewide standards, established by a recognized governmental agency, administered responsi- 
bly, and used properly, should promote access and equity in vocational education. 

Curriculum Improvement. Thegreatest value in performance testing may be Its potential for 
improving the Instructional programs. Competency-based vocational education programs are 
based on, the same job analysis concept as performance testing. Rather than simply sampling job 
skills, the competency-based curriculum requires that students be tested on objective? for all job 
skills associated with their program of study. The catalogs of performance objectives from 
V-TECS can be used to produce competency-based curricular materials and performance tests. 




With the development of porfotmance testing, many oducational institutions now roc.oQni/o 
and grant credits for competencies that studontti have auc^uiioU uulbido tho iiu.tilulioMal ot'ttrnj 
Learning does not begin and end with formal schoolmy in an institution 1 ho nood to address 
this as far as credentials 9re concerned, has been a recent development. It may be. in part, a 
response by educational institutions to the problems of declining enrollment, to the desire of 
many adults to return to school for more formal education, and the need to articulate programs 
between levels of education. At any rate, performance testing provides an opportunity for 
vocational educators to serve the needs of students and employers better as v^rell as add 
efficiency to the vocational education system With well-validated test items, students may si<ip 
parts of the instruction in areas in which they have dev.eloped competence from other 
experiences. Education interrupted by persofj|| situations or family needs may be resumed 
without loss of time and resources. 

The development of performance testing may lead to' performance pontracting to provide 
vocational education services. Private industry can identity specific groups of people who need 
specific competencies. Contracts can be negotiated with educational institutions to provide these 
services with the understanding that if the students do not perform, the budge^wlll be reduced 
accordingly By using these concepts, vocational education programs may assist governmental 
agencies seeking to solve problems such as youth and minority unemployment and training for 
displaced hdmgmakers. 

This type ot'Tndividualization" of the curriculum to fit the needs of students can also be 
achieved by (ittihg the instruction to the learning rate and style of the individual student. 
Performande testing can dilow the sludents'to progress at their own rates and the instructors to 
select teaching strategies- best-suited to the needs of each individuar9tudent. In addition, 
performance testing provide?'the instructor, as well'as program evaluators. some means of 
assessing the extent to which each student achieves the desired' goal (emplpyability) regardless 
(5f the fbute taken to that end. 

implementation and Performance Testing. Vocational educators tbnd to do things in a ; 
systematic, orderly mann'erand consequently, usually, have much success in implementing new 
programs. However,- the implementation Strategies suggested b'y.Milward and Fihch should even 
'further Improve the possibilities of successful implementation o( a ne^N concept in an existing 
prcigrtim. Implementing performance testing will be easie.r if .Milward's "street l^el bureaucrats 
arein support of tfie concept. To involve the'teache^r in1he basic inserVice program will meet the 
criteria of intervening a? clo^ly as possible to the level of th^delivery system.-Total involvement 
of studertts, parent?, faculty, adf^iinistrators. and* the community at large is most desirable. 
PerKaps the loctfl administrator, more, than any other person,^ has the g/eatest influence on 
successful Implementation of any educational concept. The administrator can assist staff 
.members to do'^ackwaVd mapping jnplarYnii>g^pr implementation. 

' - ^ At-the state leyel, vocational education must respond by providing leadership in 
implementation— Including enthusiastic prdm£>tion, infeervice,trainihg of staff, and^most 
Importantly; assurance that adequate lundlng. is avaKable from some source. Mandatory 
requirements for peffofmance testing should be avoided and some differential reward Or some 
other palatable^ meaVilsliould -be usedito secure local cooperation in i^^iplementtftipn. 




IMPLICATIONS OF 
.T»IL issues 



Conclusion 



♦ Obviously, there are many iasueii to be considered concerning the why, who. and when. of 
performanca testing In vocational education. Certain philosophical, technical, legal, and 
Implementation issues remain to be answe^d if performance testing Is to be useful and effective 
as a profeasional tool to enhance the teathlng/leaming procesa. 

The vocational community of administrators, teachers, counselors, teacher educators, 
curriculum specialists, a/^d others must respond to the challenge as they have on so many other 
occasions. While 3ome, no doubt, will reject performance testing altogether, others will find its 
appropriate uae In their own vocational educational agencies an^ institutions. 




s 






Not«t 



'Stephen J. Slater. "Performance Testing: An Overview " 

-US Department of Health. Education and Welfare. Office of Education, Federal Register. Vol 
42. No. 191. Washington. D.C.: U.S. Government Printing Office. 1977. 

>H«ory Borow. "Performance Testing and Social Responsibility: An Issues Analysis." 




l67 



164 



IMPl fCATIONS OF 



Impllcatlont of Ptrfortminc* TMtIng 
on Vocational Education 

Nellie Carr Thorogood 
San Antonio College 

San Antonio, Texas 



. Performance testing has been defined in this handbook as an applied testing process that is ' 
designed to measure performhnce on tasks requiring the application of learning in an actual or 
simulated setting Slater's discussion In Chapter One). Vocational education performance 
testing has chiefly b^n defined/as a measure of competency in some specified field of 
occupational or career training, according to Borow> in Chapter Two. 

In a period of history in which economic impact and development is of major concern to the 
nation at large, education is being ask»d to provide more experiences .related to the workplace. 
Richard Belles Indicates that "work an*d education alik6 have as their common task the business 
of teaching, refining, and using skills and knowledges."' Perhaps more than ever, there is 
increasing demand for vocational education to be more responsible for this economic 
development by providing greater reality to the workplace, and facilitating education to 
individuals. Performabqe teslihg is a clear route to the measurement of outcomes to be 
achieved by vocational education students, iqgtructors, and programs. However, the use of 
performanv testing In vocational education iS'not without impHcations and concerns. This paper 
will attempt to review the Issues and the major ImplicBtions for the utilization of performance 
testing in vocational education. 

Staketiotders in Occupational Edutation and Performince Testing 



In his book Peop/e it Work 
to persons or groups who have 
company. He wrote: 



:, Pehr G. Qyllenhe^pmar ir^troduced the term stakeholders to refer 
a "stakd''; or "interest" in.the achievements ahd well-beicig of the 



"The company -must administer the resources with which it is entrusted . : to create 
economic growth, taking Into consideration all the interest groups involved with the 
comppny. Thl» Includes consideration not only of ttae stockholders and the managers, 
but the custonriers, the supplier, the employees, the government, and society as a 
whole."' / . ' . 

\ 

Stakeholders In vocational education coul*Jnclude "students, 'taxpayers, practitioners 
(teachers, adminlstra^rs, counselors), state governments, federal government, employing 
Institutions and th^ Community at large. The Issue papers presented within this handbook ^ 
indlcata these stakeholders in the vocational education program will be Invplved In the ' . . 
performance testing process. Involvement of stakeholders In the implementation of performance 
testing Indicate^ the need for pfactltloners to consider the following types of actlvitie*: - 

• A clearly defined plan for the use of perf ormaoce testing within the vocational education 
program. 



ERIC 



166 



IHQHQGQQD 



« Artlculailon of the goals of this process among the key groups Involved students. 

secondary and poslsecorula.y schools, businossos, irulusttios. govornmontnl nc)onr\os. nnd , 
commiinitios nt Inrgp 

• Clarification of the relationship of the ongoing programs to the new process. How does 
performance testing relate to the existing program? What will be the use of performance 
testirig In admissions to programs, progression through the programs, and graduation? 

f Active solicitation of involvomont nnd commitmont from tho stakeholders in the new 
process 

• A continuous flow of'informatlon to those concerned with the utilization of performance 
testing. 

Trte primary stakeholder affected by performance testing In vocational education Is the 
student The Implications of performance testing for.lhe studerits are that the skills, knowledges, 
and competencies intended to be mastered can be measured and verified via performance 
testing. " « 

Performance testing InvoKos the student In an active role within the measurement 
proceM-the student.ls asked to perform, to show mastery of skills and knowledges. 

In the book. Carl Rogers On Personal Power,^ several trends are identified that appear to be 

occurring: 

^ • /" 

• Tow.ard the exploration of self, ari^ the development of the richness of the total. Individual, 
reisponsible humarv 

•v Toward the prizing of Individuals for what they are. regardless of sex. 
race, status, or age. 

» 

• Toward hum'an-slzed groupings In our comrnunltles. our edu^tlorial facilities, our 

... - ' ' • ■' - ' 

productive units. . ~ ^ • 

.- ' ■ ■ ' . " ■ ' 

• Toward a rp6r^-genu(ne and caring concern 'fo^those who need help. 

• Toward creativity of all sorts— In thinking and exploring. 

»' • 

These represent exciting trends, ones that are appropriate to vocational Instructors and their 
students. These trends represent the need for the human being to be literate, to be functional^, to 
be produqtive. and to integrate into the environment In which he or she lives. Performance 
te'sting cap be a positlve/accountablllty process for students while they arte in a vocational • 
education prograiri, but more Importantly it can be a valuable process to use thrpughout life In ^ 
assessing one-s^bllity to .perform.- Much of any occupational ta!|k Is performance and most of us 
are oorfipletlng.a performance test daily. • • , 

Abraham Mwlovy* described a series of ^sumptions concerning human beings In his book 
of notes entitled Eupsychlap Management Some of these assumptions are of Importance to the 
prectltloher— i)0<h Instfuctlonar'and administrative— who will implement performance testind: 

• Assume everyone is to be trusted. 



IMPLICATIONS OF 
1 Ht ISSUES 



• Assume ovoryono is to b« ifWorrnod as completely as possible 

• Assume in all employees and studenfs the impulse Jo achieve I 

• Assume that people are Improvable. 

• Assume that people prefer to feel Important, needed, useful, successful, proud, respected, 
rather than unimportant. Interchangeable, anonymous, wasted, unused, expendable, 
disrespected. 

• Assume a tendency to Improve things, to i^ake things better, to do things better. 

• Assume performance for being a whole person ar\d not a part, not a thing or art Implement, 
or tool, or "hand". 

• Assume the preference for working rather than being Idle 

• Assume all hurpan beings prefer- meaningful work rather thpn meaningless 
work. ^ - 

Assume the preference for p>ersonhood. uniqueness as a person, identity. 

Utilizing these assumptions places all of the stakeholders In an active, constructive, 
participative role rather than a passive, accepting, or destructive role. The student is actively 
involyecl in mastering the skills, knowledges, and competencies. The practitioner is actively 
Involved In linking the student with the occupational setting through appropriate and meaningful 
^ Instruction. Finally, the publics are actively Involved In the input to instructional processes aq 
well as In employment bf the students. 

The intentional outcome of performance testing can be: 

a. > 

• • Improved student skills, knowledges and abilities 

N. 

• Improved measuring and accountability processes for occupational Instruction 

• improved productivity at the occupational job^site. 

The by-produpt qLperformance testing is the focusing, by all stakeholders on improved 
human competence. 

On Competence 

'^The overriding implication frol^Hhe issues presented in this handbook Is the kjpa that 
vocational educators who use performance testing will have to continually focgs on quality, 
quantity (productivity), and costs of this process. Quality will have to be concerned wjth 
accuracy, as well as accomplishment beyond mere accuracy such as market value, quality 
judgment -points, t>hy8ical meas.urements and quality of "worfclife" ratings. Quantity will need to 
Include the rate of productivity.* the timeliness of the criteria utilized* the appropriateness of ways 
utilized, and the volume of the "how marty" question. Cost factors will include human resources 



167 



It 



170 



THOROGOOD 



for research, development, implementation, and revision; .paterlals involved in re«e«rch 
development, implementetlon. and revision; and the management Involved In supervision, 
information flow, nnd nssososmont of tho process \ 

No matter what the issues concorning portormanco tosti-.g .n vocational ociucation proflrams. 
the focus will need to continue to be compefonce-competence of knowledge co'^P^*^^^ 
sWIls and competence of applications Thomas Gilbert defines human competence as a fgnction 
of worthy performance » If vocational education leads to competence and competence Is I nked 
to performance, then at 8ome4)olnt In time vocatlonaj education must be concerned with the 
assessment of performance 

Uses of Performance Testing In Vocational Education. 

If vocattonal education is to provide students increased opportunities for employablllty three 
critical uses can be made of performance testing-advisement. '^""'^^^^^^^^ 
assessment, and certification of competencies. These us<,s of pertormance testing can occur In 
classrooms, laboratory settings, simulations, or at the workplace 

It is important to keep in mind that performance testing Is but one part of 
orocess- is but one part of the Instructional process; apd Is but one part of the certification 
I'rZTs. Howeve? It'can provide the basis for the planning of the ^''^r'te a Tartar of 
process The general goal of vocational education is access to employablllty, The general goal of 
oerformance testing ih vocational education can be to provide clearer advisement; clearer 
S^jibaTand diJec'on in instruction; and more realistic certification of competencies to facilitate 
access to employablllty. 

Kenneth Hoyt dWInes employablllty to Include the following skills, knowledges, and abilities:' 

1. tfie basic academic skills of mathematics, oral and written communications 

2. good work habits leading to productivity in the workplace 

3. a personally meaningful set of work values 

4. a basic understanding of the American economic system 

, 5. an understanding of one's own vocational interests, aptitudes, and abilities as well as 
opportunities 

" 6. skills needed to choose a career ; , - 

' 7. job-seeking, job-getting, and job-holding abilities 

8. dlscovftrtng unpaid work as a productive way to spend leisure time 

4 

9. capacity to affect positive changes In occupational society 

10. skills needed to humanize the workplace and mbve up an occupational ladder 



IMPLICATIONS OF 



The Implications of utilizing performance testing as an ends unto itself is a narrow approach 
and would have significant logal implications for practitioners. Porforrnancu tosting niUHls to bo 
one of the valuable tools of pr^)c«rtrt in fonminq oi^rupaltooHl HdurHtion on human romfvMonro 
Performance testing must bo part of an Instructional process that Includes cloar identification of 
intended outcomes; utilization of appropriate materials, strategies, and experiences to facilitate 
the intendi^ outcomes; and application of appropriate procedures and instruments to assess and 
measure the student progress (performance testing can be one of the most appropriate 
procedures). Once the student has completed this instructional process, the individual, the 
fnstructor. and potential employer will have clear Information concerning the skills, knowledges, 
and abilities that have been achieved 

The challenge for vocational education proglf^ms within this decade appears to be to 
maximize the resources available in order to provide the t>est (|uality of programs to a diverse 
clientele. The programs will have to be flexible to meet the diversity of student needs. Many 
innovations, accountability structures regulations, and guidelines have been suggested in order 
to faciliate the vocational educator's ability to produce this maximization. 

However, one of the educator's overriding needs to meet this diversity and challenge will be 
improved information. Improved information has the potential for creaWt'g greater competence in 
the day-to-day imi;^mentation of vocational education. The process and product of performance 
testing can be ene vehicle to improve such an information' flow. 

'I 

Review and Implication of Issues 

The implications of the Issues presented in these papers can be reviewed by identifying the 
issues that are internal and external to the institutions that provide vocational etlucation. Given 
the definition of performance testing presented by Slater, the following factors are important to 
issues that are internal to the implementation process: organization \^)e, technology, purpose of 
testing, task to be accomplished, and organizational resources. In addition to these factors, there 
/«"€^8o factors external to the implementing organization (environmental factors) that will have 
signlhi^ailLimplications for the implementation of performance testing in vocational education. 
The external factors Include technical, political, economic, legali social, cultural, historical, and 
philosophical arenas. These internal and external factors will iffteract and impact the 
entation of performance testing. 



Internal Factors Affecting Implementation of Performance Testing 

Concerning the identification of the implementation factors internal to the organization, the 
issue papers indicate the following: 

s 

• The purpose of performance testing for vocational education. This is thexentral and most 
critical factor. The purpose of the utilziation must be identified and clecM^ly defined for the 
implementing organization. The purpose needs to be clearly articulated. 

• The tasks that are involved in the implementation process to fulfill the goals and the 
purpose. " 

• The practitioners who will perform the tasks. 



169 



1HOROGOOD 



• The msourcBS that will l>e needed to perform the tasks nncluded ^^^''"""/^^^ '^^^^.^ 
physical resources, and flsrnt ro.oi.rces Resources ir^dude those that are interna to the 
or^nization and those that may come fron, bus.noss, Industry, and other areas of oxport.se 

. The technology that is necessary to perform the tasivs. Includod the sc.ont.fic contor.t, the 
methodology content, and the process 

. The oroan//af/on type and structure. The organization type can be a local school or training 
center a school or college district, a state agency, a federal agency, or professiona 
association Organization structure will include all of the factors that are considered w.thm 
the implementing structure authority, decision levels, and so forth. 

It is Important to note that all of the factors internal to the organization depend upon a clear 
identification of the purpose of performance testing in vocational education. O;^^ P^^^ " 
clearly identified, then the practitioners are responsible for implementing the tasks with the 
highest amount of technology within the constraints of the organization s type, structure, and 
resources. 



External Factor's Affecting Implementation of Performance Testing 

The presented topics have dealt primarily with the philosophical, technical social and legal 
issues confronting the Implementation of performance testirjg within 
are additional issues in the implementation of performance ^^^l^.^Q^l"^^."^^^^^^ 
cultural, and historical forces and factors. The latter will f)e defined briefly, but need to be 
considered In detail for future study. 

Historical Force9and Factors. As with the utilization of any major technology and 
phenomenon, the historical elements are to be valued, Major historical issues Impinging on 
vocational education and performance testing include the following: 

• The traditional ways of preparing for work, that is. (1) organized apprenticeship-either 
voluntary or rnvoluntary: (2) family teaching of a trade or craft;^ and (3) the pick-up method 
by observation or imitation. 

• The concept of the educated worker-both in the area of liberal arts and Iri occ^PjJ'^^^' 
learning-has been a theme of vocational education since the beginning of the 20th century 

• The concept of performance as a measure of work productivity. 

• Federal and state legislation. 

• Technological developments. 

• Knowledge development concerning: (1) the ways In which people learn, and (2) 
methodology of diagnosing learning needs and learning occurence. 

• Vocational education's Intention to relate to people ^d the work they do; 

• The belief In the reality of Individual differences in personal competencies and In the ability 
to observe them (see Borow, Chapter Two). 



1 



170 

173 



IMPI ICATIONS OF 
IHt iSSUbS 



• Applications of institutional testing during the early part of the twentieth century including 
(1) Intelligence testing of children. (2) omploymont testing industry. (3) objective testing in 
(ho 'jcfioolii (i>uo DoJUw» Ctuipluf Two). 

• Continuous use of performance testing in the U S military 

• Performance testing as the oldest form of evaluation of individual achievement (see Klein. 
Chapter Three). 

Political Forces and Factors, implementators of performance testing must always be aware of 
the political implications-- power, control, "ownership" of standards, attitudes of major groups 
with a vested interest, opinion, and reactions to implementation tasks and technology. 

Cultural Forces and Factors, Practitioners will rieed to consider cultural norms, cultural 
values, work place values and ethics, subcultures within society, public attitudes, social and 
cultural groups practices, and so forth. 

Economic Forces and Factors. Whether the setting is a public or private institution, the 
general economics of the implementation process and tasks will need to be considered. The 
implement ors must also be aware of the well-being of the general economy. For example, if 
additional financial resources will be required by an institution to implement performance testing, 
where will the funds be generated^ what is the general economic ind^iator of the time? what is 
the unemployment rate? 



All of these forces and factors need to be studied in greater depth for their implications for 
performance testing in vocational education. However, some of the most important factors and 
issues are found within the philosophical, technical, social, and legal arenas. The contributions 
and constraints to the implementation of performance testing from these areas have maximum 
implications. 

Philosophical Forces and Factors, The values, ideals, ethics, and concepts that exist both 
internally and externally to vocational education yvill have direct implications on the use of 
performance testing. An exceptionally critical impact will be in policy-making at all levels'and 
specifically within policy-making concerning the definition of thepuf{X)se of performance testing 
In vocational education. Philosophical issues include the following: 

• Ideals of the models of performance testing models to be utilized. 

• Integrity of the measures of competence, 

• Commitment to the purpose and to the technical methods. 

• Concepts and endeavors focused on the totaf well-being of the individual student. 

• Conceptual purpose of performance testing to the whol^of vocational Education processes. 

• PeHormance testing interface with Ideals of the society such as democratic ideals, national 
priorities, welfare of the individual, worth of education, mission of schools, and open 
admission policy to Institutions and programs. 

« 

• Concentration on outcomes of students, personnel, and programs. 



I 



THOItOQOOD 



Technical Forces and Fnctors («pplicabl« to both the internal and external factors). Major 
technical issues of portormance testing that can inapinge on vocational education inciud^ 

• The process for identification ol competencies. 

• The setting of standards. 

• Objectivity 

• Validity-criterion, content, construct, consistency. 

• Reliabilty. 

• Application of performance testing (diagnostic; advisement; assessment; evalation; or 
certification). 

• Utilization procedures (purpose, policy, and operational). 

• Costs (dollar resources, human resources, time, physical resources). 

• Quality of the competencies established and standards set. 
Other technical issues include: 

• The need for the performance tests to closely duplicate neality. 

• The need for the skills, knowledges, and competencies required in an occupational field to 
be identified by persons in the field. 

. The need for realistic, supportive test-related materials-instructional experiences and 
materials; laboratory experiences; simulations; work experiences. 

• Observer and rater variability. 

V 

• standardization. 

• Efficiency of process, products, and procedures. 

. Currency of tests in relationship to reliability and ve^lTdUicdevelopment process. 

• Security of tests. 

The actual construction of a performance test is a sophisticated and critical process. The 
steps offered by Klein are worth reviewing because the thoroughness aspect of the test 
developmem process has major Implications for vocational education. Tb# technology of this 
or^eaS w7lmpact on all s^keholdem in vocational education. It is important to cons der thi| 
pr^SsJ Ts boTdynamIc and continubu. If the performance testing used in vocational education 
Is to be realistic to the workplace. 



172 



17 



ERIC 



1 



tMPLICAIiONS Of- 

THE issues 

Social Forces and hactofs. A variety of social issues can have an impact on the use of 
portormnnro tostln^ In vornttonnt ^ducntion Mofst of thono irisuo?; torus on tho wolfaro of tho 
individuals or groups and the ideajs of the society These issues Include 

^ Diversity in the needs of populations to be served-age. sex. ethnicity, learning abilities and 
disabilities, and various gradations of economic status. 

• A move to look beyond Just the needs of high school age youth to determine what is 
expected of vocational education. 

Ability to dqpe with nontraditional students. 

• Technological displacement of employed persons. 

• Learning experiences that occur as part of the normal process of work, community service, 
and life. 

- ♦ Economic development and maintenance of communities in specific and of the country in 
general. 

• Expected linkages between education and the place of work. 

• Expectatfons of t^stU)g purpose (formative and summative program evaluation, instructional 
management, programing, and decision-making; and student diagnosis, advisement, 
achievement, and certification). 

Legal Forces and Factors, Legal forces that st^fround performance testing in vocational 
education incfude the lagal framework^of the School; the local, state, and federal laws; decisions 
of the courts and quasi judicial bodies; and decisions and standards of regulatory agencies. In 
Chapter Four Tractenberg identifies seven legal concerns related to performance testing. They 
stem from: 

• Federal and state due process. 

• Federal and slate vequal protection- clauses. 

4 

^ 

• Federal and state clauses protecting privacy and freedom of belief. 

• state education clauses. 

• Statuary laws. ^ ^ 

• Regulatory laws. 

. » * 

• Common law. , * 

Pullin identifies thete major legal Issues that are of concern in performance testing to 
include: student, personnel, and program accountability, due process in the use of performance 
testing, discrimination in the use of tests, and the/ight to privacy. ^ 



ERIC 



THOROGOOn 



lmplem9ntation Forces and Factors. As Milward and Finch reveal (see Chapter Five) the 
implementntion process is a complex one Therefore, it is important to reView from the papers 
some of the key implementation issues that will impinge on the use pf porformnnco tes<inq ir^ 
vocational education Ihese implemerUation ishuos incluvJt). 

i 

• Overall policy guiding the implementation process. What are the goals of intended 
outcomea? From what level are these goals generated-local. state, federal, other? 

• Time Involved In the development, implementation, evaluation, and revisions of the proceses 
and procedures 

• Costs of resources necessary for effective implementation 

• Quality of competencies, standards, tests, and utilization techniques 

• Quantity of the competencies, standards, tests, and utilization techniques. 

• Methodology utilizect~feedback. guidance, complementary education and training, rein- 
forcement and remedial instruction, and assessing. 

• The Implementation setting including curriculum, teachers, support personnel, administra- 
tion, students, employers and the community at large. 

Each ot these issues must be considered in relationship to the students, the practitioners, 
and publics who will be involved in the process and procedures. 



Conclusion 

In a period of time when lifelong learning, continuous development, career education, 
high-level technology, accountability, and emergin9 occupations are more than just sets of 
words linked together, the challenge for the utilization of performance testing within vocational 
•ducatlon Is critical. Since performance testing Is not new to vocational education, the. true 
challenge Is to adapt perforrtiance testing to the diveraltles and demands currently being placed 
on vocational education programs. In meeting the challenge of these demands, perforpnance 
testing may be used to assess prior learning and work experience so that the student can begin 
at the most appropriate educational level. Performance testing will probably continue to be used 
for certification In certain professions. Performance testing may be used for effective articulation 
from secondary to postsecondary levels. And. performance testing may be a vehicle of learning 
that Irmost closely related to the work situation. Afterall, productivity In professions, in 
businesses. In the trades, and In life generally Is attuned to performance. 

Therefore. vocMlonal education programs through (1) a clearly define^! plan of implementa- 
tldnL(2) a clearly defined plan of development and utilization of criteria: (3) a clearly articulated 
flow of Information persons directly (students, practitioners, employers) and Indirectly 
(taxpayer*, governmental agenclles, and citizens) Involved with the process: and (4) continuous 
feedback system of Infonnatlon can effectively utilize performance testing as a product as a 
process of letrhlng'to achieve competence. ^ 

Per/ormance testing is not a perTected process at this point In time. The potential use of 
performarice testing in vocational education will depend upon thd direction of the future of the 



174 



I 



IMPl (CATIONS or 
THE ISSUES 



1/ 



work place: the direction of education procesMs; and chtlcally. th« dlriK^tion of technological 
advancements both high and appropriate technology However, it has enough merits to bo 
continued, to bu iinpiovuU. and to be utilized ati a tiansitional piucubs until inuie appiopnatt^ 
proceases are developed. It 1^ Important not to let legal, social, cultural, and political constiaints 
hamper the use of performance testing In vocational education. Historically, the purpose of 
vocational education has been educating an Individual for gainful employment. A major vehicle 
utilized to produce these skills and competencies was performance- th^jiblllty to show In the 
laboratory or on the Job an ability to produce and perform with competence. The implications of 
the issues presented in these papers Indicate that performance testing will continue to be a vital 
alternative for vocational education However, the practitioner of the future will be challenged to 
be clear In the definition of competencies, knowledgeable and sophisticated In testing 
methodology, and articulate In communicating all of the above to the stakeholders who have ^ 
interest in vocational education in general and in performance testing specifically. 



1. 



175 



ERIC , 



Not«t 



'Richard Bolles, "Training for Transition." .Edtvca^/OM m)d \No,k (Chango Magazine Press. 1979). 

T ^'Pehr G. Gyllenhammar. ^eop/e ^X Work (Reading. Massachusetts. Addison-Wesley Publishing 
Company, 1977), p. v. ' r 
3Carl Rogers. Car/ flogorYn Persona/ Po^ver (New York: Delacorte Press. 1977). pr 28?. ' / 

^Abraharr; H. Ma^iJlow. Eu%ohian ManagemBnt: A Journal (Hometown. Mlinois: Richard D. Irwin. 
Inc., 1965), pp. -(7-36. ■ ' \ . ' . ^* - 

»Thoinas F. Gilbert. Humar^ Corfipetehce: Engineering Worthy Competence (New York: ^ 
Mc0raW-HIII Book ComQany, 19^8), p.,18. 

«Ke.Sheth B. Hoyt. "Employabftity: Are'the Schools Responsible. ' A/«»v Directions for Education 
and.Wor/( T (Spring, 1978): 30. " .. ' , ' . " " ' - . 

^M6lvin L BaHowV-lmplications From the History of Vocational Education", (Columbus. Ohio: 
^ Center for vfecatiori^) Edu<^atlon. The Ohio State^LAiiv^rslty. 1976). pp. 1-2. ^ • 



IMPLICATIONS OF XHt 
ISSUES: COMMENTS 



IMPLICATIONS FOR VOCATIONAL EDUCATION: 
A THIRD POINT OF VIEW 

Marvin R. Rasmussen 
Portland Public Schools 
. - Portland. Oregon 

•» 

purpose of the two papers irv this summary chapter was to review the issues 
Ming performance testing and vocational education identified in the prece^i/ig chapters 
>rtng intb sharper foous the implicQtionb pf these issuesJor vocational eduoatlon. This 
fall a task, and the two papers succeeded In accomplishing It to a varying deg|[6e. In the 
le'of theee efforts they have provided valuable additional perspectives on mUny of the issues 
issed In the earlier chapters. 

:!' \- » 

thorjogood's paper addresses iome of the relevant issues. Others are ijnfortunat^ly omitted 
or given scant attention. Early in her paper she acknowledges the relationship between 
competency-based education and performance testing. Both movements stem from the samd 
social and educatFonal sources— loss ptpubllc.confldence Ih education apd reco9nltlon of the 
specif I needs of the less academlcalty talented students. Moreover, the two concepts are^ 
logicajly linked In that the 'We sklllsV outcomes sought In competendy-based programs often 
lend file mselves well to performance testlrvg and p«rhaps only to this form of measurement. 




"^wo other related points that deserved more atty tlon are: (I) perfprimance testing needs to \^ 
be integrfited Int6 the Instructional p/Vpcess. and (2) performance tests cost more than 
C9nventit:>nal paper-and-pencll tests, 



The crgglal issues of the greater costs of performahce tests as compared to standardized 
testSvIs only hinted et. It woulc( have been gseful to poinf out that the performance iesX Is a more 
direct measure of ^t\ident achievement and this ^ends to^lncrease4ts Validity and therefore Its / 
usefulness. But. this lncf%ase is plifci^ed only at a substantial increase fn the cost of testing In 
dollars and tim^. Qreat chre n^eds to 'be used In deciding w^ether there Is a reaUEncrease In 
validity and, if so. whether It Is worth the Increased cost over l^s direct but perhaps adequate 
lneastJr«s. , » - • ' ; 

Thorogood's discussion of the legal Issues In performance testfng identifies the major freas 
of legal concern and makes som^ueeful' suggestions for fairness and privacy. In reviewing - ^ 
Tractenberg's papef. Thorbg6od#)otes the legal Implicatlorrs of the key tectMilcai Issues In 
performance testing. 

1 Overall. Thortfoood's paper was ap incomplete but yatbable contribution to the discussion of 
W issues surrounding performarlce testing and;vo^tt1bnal education. Spjllman and Wade's 
paper Is comprehensive and Insightful. Their valuable contributions woulcf have t>een more 
accessible, however, ;if they Had organized their discussion of issues by the categories u^ed in 
; the pritoeding chapt0rs of this handbook.* Thu?. we would haye had MIscussipn of each of the: 



philosophical. tech,m:al. logal and ..npL-numtation issuos and thoir .mpl.^ntlons, r.ppod off hv n 
- unnnatv of thp implimtinn. for vocat.onfll f>d.i(:fll.on as w.ill as lor udiication i}^ wholo 
instoad tho socfon t.tlo<J, •InnnHtant Issue, for Vocational Fdt.rat.on" .s dmdP^.«lo I .u,al 
Mandates " "Human ,Resources Needs." -Student Needs." and • Instructional and uurr.cu ar 
Concerns: - While rr^ost of the major issues presented m the preceding chapters are touched on 
in the course of these discussions, it i# difficult to hold them in perspective t>ecause of the 
organization of this section . 

The sutJsGCtion on "Legal Mandates ' illustrates the organizational prohlom It starts ©ff try 
noting that there Is no legal mandate for portormar^ce-testing, moves to its role in program 
eva luat dn and jumps to an acknowledgment of a relationship betv^een competency education 
and performartce testing. From there, t^^e subsection moves to a br ef discussion o student 
certi!lca«on and the legal implications. All of these are are^s in which issues exist hat should, be 
Identified and discussed. HoweveF. the issdes do not fit well beneath the treading Legal 
IVlandates." and they lack supportive content due to this organization. 

The subsection titled "Human pesourcos" has three, paragraphs on that topic, but a final 
two-sentence paragraph touches on,two key issues in performance testing: (I) the directness of - 
t^^B r JaTon8h,p of the 'performance measures to the job situation and (2) the need or follow-up 
validS\tudies. The ifnplications of the crucial first point for cost, validity. d« 
inrsflni utility need to be discussed in detail as does the second poir^t on validity studies. 

The legal Issues surrounding performance testing are discussed briefly and sonriewhat 
inappropriately in the subsection titled "Student Needs." These issues should have been 
developed at greater length. For instance, the authors could have shown how performance tests 
tend to require greater job relatedness and validity in testing, but the frequent use of raters 
requires careful safeguards-so that bias does not creep In. ^ 

The subsectlon on "Institutional and CurriculumXoncerns" touchea-on the key Issues of 
cpst anytime required for performance testing, but it fails to offer help in deciding when the 
greater \?p8t and time is justified. . , - ' » , 

In the section titled "Responses of Vocational Education to the Issues." they note that 
performance testing is not a panacea and there are times when other forms of testing ar^ _ 
brelerable The section wowld have been more comprehensive if they had also said something 
about performance testing being only one more instrument in the growing arsenal pf instruments 
for.pupll and program evaluation, and about its place in a balanced and comprehensive 
'Evaluation strategy. 

' Sbillman aod Wade seem to support thto notion that the cWef contribution of performanoe 
Wing in vopational education will be to prog.rajn accountability rather than pupil assessment. I 
believe that it Is a miataken notion since the needs of accountability are already we I served by 
simpiar and less expensive measures such as.the proportion of graduates who obtain and retain 
lobs riitlno of job auper^isors. ^fid so forth. I belieVe that it is in the areas of student needs and 
■ iob preference identification, instructional planninj^. and certification that performance testing will 
mak* Ita major contribution. * 

: ' The other sec^lona in this paper touch on the key issues, including "validity and reliability. 

relative difficulty of development, effects on students, legal concernt, the need for. clear task 
"analysis, and the deslrablltiy of avoiding mandates of' the use of performance tests. 

Ur)fortunately, the allusions to these issues are brief. 

■ ,. . * \ V' • "8 X8l > ' . 



IMPLICATIONS OFTHE 
, ISSUES: COMMENTS 

In siinu7iHrv, t)()th papers H((HrT>pl to lio toi^HthHf lour coniplox issdhs iir>ikM4viruj por lor rnaix u 
testing and their implications toi vocational educatioiv Uns is not an easy assiyninent and the 
contributors are to be commended for ttieir efforts This commor>ts paper attempted to highligtit 
and support points made by the contrrbutors ar\d. in some cases, to raiso additional points 
Taken together, however, the throe papers only tpuch tho'tip of the iceberg Vocationaf 
education is bound to review ttiese issues ttme.and time aqairx as it designs arvd implements 
porformanco tests The vocational education systorTi is ccnTiplox and (1ynami(\ and as it changes, 
scrmust its ovaluation mettiods 



» - . . - : - 179 / ^ 



CHAPTER SEVEN 



GLOjSSARY 



V 



Th9 haiidbook /« npl9t9 with tBrmsJhat rpiy be uhfamlllar to some of the. audience. In responsei. 
a gloasary of Important terrns appearlng^p the papert was prepaced. The definitions contained i(¥^ 
. the glosiary were drawn from the papers wher^^er possible. It should be noted that h sdme ' ' 
cases ttie^s^me term was defined by more than one author. In those cases, the^ttrl^fest (definition 
was se(ected for Incluslog. 



GLOSSARY 



GLOSSARY OF TERMS 



Ttrm 

Behavioral Process 
Behavipral Product 

Change \ 

/* 

Change Advocate 
Common Law 

^ 

Competency Based Education 



Concurrent Validity 



Consistency^ Validity 



'Construct Validity 



Content Validity 



Contrast Error 
Criterion-Referenced Tfists 



Criterion Validity 



Definition 

The way in which a task, duty or operation i^ carried out 
The outcome resulting from some form of behavior 
Any alteration in the status quo 

Some person, group or organization nj:tirtg as an initiator in a 
"change process." 

The law of a country or state based custom, usage, and the 
decisions and opinions of law courts. 



The usage of competencies (skills, attitudes, values, or ' 
appreciation that is deemed critical to successfirt^employ- 
ment) as a basis for dev^pment of curriculunr content: this 
encompasses making amiable explicit criteria for each 
competency, assessing competence in applied sitings, hav- 
ing demonstrated competence serve as a determiner of 
student prdgress. and focusing on facili^m'^'^ ^f stJdent 
achievement of competencies. ^ 

The relations^hip of a test with meaningful samples of behavior 
as criteria. 

The extent to whichi a person's result on a test corresponds to 
the person's performance on a task which' the test presumably 
assesses when t^oth performances are measured at approxi- 
mately the same time. 

The extent to which a test measures hypottfatical concepts or 
qualities. . * 

The extent to which the content of the test samples subject 
matter, skills or behavior which fhe t|est attempts to assess or 
« predict. , • 

Thj^ tendencyx^n the part of rater^ to see others as opposite 
to themselves. * * - . 



Tests in which an individual is assessed relative toia specifi 
Standard rather than to his/her perfor^nance relative to other 
individuals or tb'group norms. , v • 

The..ability of a test t(f predict future school oir job 
performance. 



GLOSSARY 



Critical Ciunpotofu:ios 

Direct Assessment 

Duo Procoss 

Educational Change 
Equal Pmtection 

Error Variance 
Face Validity 
Generosity Error 
Halo Effect 

Ideas in Good Currency 
Innovation 

Mapping Backwards ^ 

■J 

0 

Minimum Competeocy 
Testing 

Norm i , 

> 

Nornn-Referenced Tests 



Skills idiMTtifiod ns ossetxtinl to ndoqiinfply (i»>rt(>fni a spncilii. 
occupation 

Direct observation in a real life setHng. _^ 

An Indivifliinl's right to ho treated with fairness, consistency, 
and iHck of art)iltarin<)ss hy ^jovornrTinntal aqoiu-.ios and 
omployors ^ 

Any significant alteration in the status which is intended to 
benefit the people involved. 

A constitutional principle related to due process, prohibiting 
any state from denying to any person within its lunsdictiwi 
the equal protection of the laws 

The variability of nneasures due to random fluctuations, 
having a basic characteristic of self-compensation 

The apparent ability of a test to measure what it apt)ears to ^ 
measure. 

The error that results when raters overestimate the positive 
qualities of individuals they like. 

The effect that results when raters generalize their inlpres- 
sions from one rating to another 

Ideas which become important by having an impact on the 
formulation of pViblic policy. . - ' 

A product or. practice new td the adopting unit (e.g., school 
system, classroom), fg^ 

Ajechnique for arriving at an estimate of what will be needed 
to successfully implement a new program or practice. 

A standardized examination^esigned to demonstrate 
whether a student has reached a given level of proficiency m 
an>\pne of-several basic academic skills required to function 
in eV^day adult li'e- 

A standard pf achievement as represented by the average 
achievement of a large group., „ • 

A task which seeks to compare an individual's performance 
relative to Jhe average performance of a cyroup of similar 
individuals. - ■ 



GLOSSARY 



Performance Test I Refers to tests in which the test stimulus, the desired 

(osponso, and the uuiioundiny bonditionb appioxunatu tho 
reality of an actual situation drawn from a specific occupa 
cttional or role-based content. 
2 They assess a portion or all of an actual work setting by 
attempting to approximate the reality of the actual work 
selting. 

Predictive Validity The ability of test scores to relate to criterion measures which 

are based on some future performance. 

Prima Facie At first view, on the first appearance. 

Procedural Due Process The process that requires that the state act in a fair manner 

when it deprives a citizen of liberty or property. 

t .A 

Reliability ^ Whether the test measures a characteristic accurately and 

consistently. 



Response Characteristics 
of Tests * 



Simulation 



Standard Error of Measurement 



Standard izati 




Street Level Bureiaucrats 



Stimulus Characteristics 
OifTests . 



Two response categories have been of defined: I) respondent 
behavior requires the examinee to choose from a limited set 
of clearly defined response alternatives; 2) operant responses 
are characteristic of behavior in real life situations, and hence 
do not have artificial, preconceived constraints limiting the 
behavior that might be observed. 

The process of abstracting some ^spects of reality and 
concretely representing it in the form of a. specific simulated 
task which examinees are expected to perform. 

An indicatiorl of the magnitude of error between "true" and 
observed performance. The larger the standard error the less 
confidence can be placed in the findings. . . 

The administration of a perforrnance t^st to which each 
student in an identical manner by means of: the provision of a 
handbook providing directions to both examiner and student; 
a set of Jobs required by eacWcandidate, including informa- 
tion of specific criteria, item insights and^the amount of time 
usually required to complete each subunit of the test; and, a 
scale stipulating specific criteria. 

A government officiai-^uch as teachers, police officers, ^ 
welfare workers or public health officers, who interact directfy 
with the (^u^l^c^^^ makis decisions on the basis of indfVidual 
initia^ as )||^l Bti established routine, 

A test which retains a set of instructors, a prompt, a ^ 
demand, or an event that initiates the examinee's behavior. 



Snh»tflntiv« Duo Process 
Surrounding Conditions 



Targeted Consumer 



Test Bias 



Validity 
Verisimilitude 



Work Samples 



The proc«88 that (oquif<^J^ notion of thn stnto ho 

fBtionnl Hocl fHRSOOHhly ihIhUhI lo lowitmialo slalo ubjtK livu 

The environmental conditions under which a task is 
performed. 

Those consumers to whom the educational innovation is 
diroctod 

The characteristic of a tost in being free of various typos of 
content bias (e.g.. numerical, role, status, stereotype and 
familiarity.) 

Refers to whether the test actually measures the characteristic 
that it claims to measure 

Performance tests In vocational education which take the 
form of work samples or job simulations closely resembling 
the actual on-the-job task to be performed by the worker, 
those tests having a high face validity. 

Selected tasks performed under controlled environmental and 
time conditions. The aim is to standardize tasks and enhance 
replicabillty across examinees under conditions controlled 
and specified by tife examiner. 



• 



4 > 



186 



187 



CHAPTER EIGHT 



CONTRIBUTORS 



The contributors to thisi handbook were drawn from varied disciplines and, professions In an 
effort to address the Issue of performance testing from a mUltldhiplinary perspective. Thus, while 
the nam^sjand professional affiliations of some of the contributors may be, familiar, others may 
not. Tb provide a context for the reader, Chapter Eight consists of a brief biographical sketch of 
each contributor. - 



, r • 187 • 



♦ ' f 

19s . 



CONTRIBUTORS 



CONTRIBUTORS 



Henry Borow (Ph D , Pennsylvania State University) is a professor of psychological studief^. 
General College and College of Education at the University of Minnesota. He is the author of 
over 100 journal articles, books, book chapters, and tests. Dr. Borow is a past-president of the 
National Vocational Guidance Association and editor of its fiftieth anniversary volume. Man In a 
World at Wor^ (1964). He was a postdoctoral fellow of the American College Testing Program 
and served on the national advisory board of the National Center for Resarch in Vocational 
Education. . 

William G. Buss (L.L.B., Harvard University) is a professor at the University of Iowa College of 
Law. He has published extensively in the areas of educational law and constitutional law. 

Curtis R Finch (Ph.D., Pennsylvania State University) i^ professor a()dchairman. General 
Vocatioriallfnd Technical Education. Virginia Polytechnic lnstitute^ari^!l|ate University. He has 
served on the faculties of Ohio State University and Pennsylvania State Unt^rsity. Dr. Finch has 
served as editor of the Joi^rnal of Vocational Education Research and Occupational Education 
Forum. He has authpred or co-authored over seventy professional articles, papers, and reports 
aryd is co-author of Curriculum Development in Vocational and Technical Education (Allyn and 
Bacon, 1979). Dr. Finch served as a Senior Fulbrlght Lecturer to Cyprus during the first part of 
,1980. 

Raymond S. Klein (Ed. D.. State University of New York, Buffalo) is the program coordinator at 
the National Occupational Competency Testing Institute (NOCTI), Albany. New York. He has 
also^erved on the faculty of Pennsylvania StaM^ University and as the director of research for the 
New York State D^fSartment of Education. 

Samuel A, Livlagston (Ph.D.. Johns Hopkins University) is A program i^search scientist at the 
Center for Occupational and Professional Assessment at the Educational Testing Service. He has 
been involved in the area of performance testing for the past seven years during which he has 
develojTed perforrriance tests for such varied occupations as firefighters, radiologic technicians, 
dental aftaistants, dental hygienists. and machirie tenderers. 

H. Brinton Milward (Ph.D., Ohio State University) is an assistant professor of Business & Public 
Administration at the University of Kentucky. He formerly served as associate director of the 
Graduate Program in Public Administration at the University of Kansas. His published research 
has been in the fields of organization theory and public policy. Dr. Milward is currently testing an 
organizational theory of discrimination in colloges^and universities. He also serves on the 
editorial .board of, The Annals of Public Administration. 

Evelyr) P«rloff (Ph.D.i Ohio State University) is an associate professor ^t Nursing Research and 
of Psychology in the ^School of Nursing at the University of Pittsburgh. She has also served as a 
faculty member at Purdue University, Northwestern University, and Kendall College. Dr. Perloff 

has publlsiied Mildely in (he area of program evaluation. 

» • " . 

\ 



189 199 



( 



CONTRIBUTORS 



Diana C Pullm (J.D., Ph D . Un.vo.sity of Iowa) is a stMt attornoy nt the C.9n\or tor I nw and 
rdiicntion Inr nnmhriftqo Mfl«^srtrhi.s«tts Sh« has proviouslv sprvod as legal counsol tor local 
school districts and an ifitormodiato oducational ngoncy Or Pullin ropresoniod the students a.»d 
D«wt8 who successfully challenged the State of Florida's use of ? minimum comp«len,iy lest to 
deny high school diplomas in the federal court lawsuit Debra P. v. Turlington. Dr Pullin s 
previous publications have been in the areas of minimum competency testing and the law 
relating to the education of children with special education needs 

Marvm R Rasmussen (M Ed., University of Oregon) is Director of District Programs for the 
Portland (Oregon) public schools. He has served as the diroqtor of career education programs 
for the Portland Public schoojs and as a principal, administrative vice principal, and secondary 
teacber. 

Stephen J. Slater (Ph.D., University of California at Santa Barbara) has been responsible for 
coordinating activities of the Clearinghouse for AppUed Performance Testing (CAPT) an NIE 
sponsored project at the Northwest Regional Educational Laboratory. In that position Dr. Slater 
edited the CAPT Newsletter, prepared an extensive annotated bibliography on applied 
performance testing, and organized the 1979 Annual CAPT Conference, entitled Alterriative 
Conceptions of Competence Assessment. Recently. Dr. Slater joined the staff of the Planning, 
and Evaluation Section. Oregon Department of Education. 

Robert E SpiHman (M.S.. University of Kentucky) is Director of the Kentucky Bureau of 
Vocational Education, He has served in the capacities of acting deputy superintendent for 
Occupational Education, secretary to the State Board for Occupational Education; and director 
of Supporting Services Division in the Bureau of Vocational Education. Mr. Spillman has b^en a 
secondary vocational teacher, teacher educator, and curriculum writer. He has articles on 
corfipetency-based vocational education. In addition, hp has been Kentucky s representative on 
the V-TECS Board of Directors serving as chairman o| the organizing committee and Board 
chairman for three years. Mr. Spillman and Dr. Wade have jointly been involved in several other 
relatractL^^^ They were co-directors of a 1975-76 Region IV EPDA Workshop on CBVE and 
co-authors of articles o) CBVE and vocational student organizations. They participated in the 
study design, and implementation of one of the most comprehensive statewide programs of , 
CBVE. Kentucky's program, based on the V-TECS catalogs, currently involves 22 occupational . 
areas and has been implemented in 1.090 «lpecitic programs. 

Janef E Splrer (Ph.D.. Ohio State University) is a research specialist at the National Center fpr 
Research in Vocational Education. Dr. Spirer served as director of the project under which the 
handbook was produced and edited the manuscript. Her research interests focus on human 
resource policy and program evaluation. 

John F Thompson (Ph.D.. Michigan State University) is a professor and chairman of the 
Departrndnt of Continuing and Vocational Education at the University of Wisconsin-Madison His 
research interests and publications have been in the areas of philosophy of vocational education, 
curriculum in vocational education and inservice and professional development education. 

Nellie Carr Thorogood (M. Bus. Ed.. North Texas State University) is Director of Occupational 
Education and Technology at San Antonio College. She has community college and university 
work •xperlences as ah instructor, cooperative education coordinator, program area coordinator, 
division chairperson, and teacher-educator. She has served as a merfiber of the Alamo 
Consortium Private Industry Council and Youth Council: as a chairperson of a statewide 
committee studying meeting the special needs of occupational students in Texas; as a member 



190 



J90 



I 



CONTRIBUTa 



of both stato and national task forcns on tho impact of vocational oducatioo data systiutK; on 
poHtsecondary occupational mjucatior), nnti as an advisory rnHnih^r ol sever hI local ernployrT>Hnt 
and education programs. 

Paul L Tractenberg (J,D., University of Michigan) is a professor of law at Rutgers School of Law 
in Newark, New Jersey where he specializes in pubfic education law Within that field, ho has 
taught courses and seminars at the law school, researched and written extensively, and 
presented many papers and speeches to national, regional and statewtdo orqant/ations Prof 
Tractenburg has also consulted with rTiany groups and established aTi ongoing public interest law 
center to represent the interests of students and parents Cuff^ntly, ho is especially involved in 
assessing the legal implications of minimum competency and performance testing of stu<<ents, 
teacher competency measures, and school finance reform. Also, he is writing a book, under a 
Ford Foundation grant, about the role of the courts in educational n^form 

Charlei D, Wade (Ed D . University of Kentucky) is tho director of the Division of Vocational 
Program Development of Education (Kentucky Bureau of Vocational Education). He has served 
as an RCU research associate, a program supervisor, a secondary vocational teacher, and a 
part-time teacher educator. Dr. Wade has addressed 4, variety of national and state conferences 
on such topics as program planning, competency-based curriculum, cooperative education, and 
evaluation of Vocational programs. 

Jack C. Willers (Ph.D., University of Texas at Austin) is a professor of history and philosophy of 
education at George Peabody College for Teachers of Vanderbilt University. He has held a 
Fulbright-Hays Lectureship Award to Irap, Greece and Egypt, Dr. Willers has published widely in 
several journals on philosophy and the social foundations of education and educational policy 
issues. 



\ 



EVALUATION PUBLICATIONS 
OF 

THE NATIONAL CENTER FOR RESEARCH 
IN VOCATIONAL EDUCATION 
ON EVALUATION 



EVALUATION HANDBOOKS SERIES 

• Guidelines and Practices for Follow-up Studies of Former Vocational Education Students 

• Guidelines and Practices for Follow-up Studies of Special Populations ' ' -' 

• The Case Study Method: Guidelines. Practices, and Applications for Vocational Education 

• Performance Testing: Issues facing Vocational EducatH^ 

• Evaluation Guidelines and Practices for State Advisory Councils 

CAREER EDUCATION MEA8MREMENT SERIES 

• Assessing Experiential Learning in Career Education 

• Career Education: A Compendium of Evaluation Jnstrpments ^ 

• Improving the Accountability of Career Education Programs: Evaluation Guidelines and 
' Checklists 

• A Quide for Improving Locally Developed Career E(fucation Measures 
Using Systematic Observation Techniques in Evaluating Career Education 



VOCATIONAL. EDUCATION OUTCQMES SERIES 

• ytewpoints on Interpretipg Outcomf^ M^ki^ures . in Vocation^ 

• YocAtions) Education Measures: InstrumentSato iSurvey Forrneit Students and Their ' 
Employers / \ 

• ^Vocational Education Outcomes: An Evaluative Bibliography for EmpiricarSfudies , 

• Vocational Education Outcome Perspective)^ for ^valuatfon ' / 

• VQcationaljEducation Outcomes: A Thesauru^bf 6utcome Questions 

• Vocational Education Outcomes: Annotated Bibliography of Related Literature 



For Information concernirig the above piiblibations. please contact: 

> 

/ ' ' Progrart) Information Qtflce 

The National Center for^esearch 
in Vocational Education The Ohio State University 
^ 1960 Kenny Road 
Columbus. Ohio 432)0 « 
> ^ (614)486-3655 



^0 

4^ 



